Re: Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-15 Thread Takuya UESHIN
gt;> > >>>>> > The vote is open until April 17th 1AM (PST) and passes >>>>> > if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. >>>>> > >>>>> > [ ] +1 Use ANSI SQL mode by default >>>>> > [ ] -1 Do not use ANSI SQL mode by default because ... >>>>> > >>>>> > Thank you in advance. >>>>> > >>>>> > Dongjoon >>>>> > >>>>> >>>>> - >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >>>>> -- Takuya UESHIN

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Takuya UESHIN
ull/45053> > SPIP doc > <https://docs.google.com/document/d/1Pund40wGRuB72LX6L7cliMDVoXTPR-xx4IkPmMLaZXk/edit?usp=sharing> > > Please vote on the SPIP for the next 72 hours: > > [ ] +1: Accept the proposal as an official SPIP > [ ] +0 > [ ] -1: I don’t think this is a good idea because … > > Thanks. > -- Takuya UESHIN

Re: [VOTE][SPIP] Python Data Source API

2023-07-10 Thread Takuya UESHIN
; +1. >>>>>>>>>> >>>>>>>>>> See https://youtu.be/yj7XlTB1Jvc?t=604 :-). >>>>>>>>>> >>>>>>>>>> On Thu, 6 Jul 2023 at 09:15, Allison Wang >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I'd like to start the vote for SPIP: Python Data Source API. >>>>>>>>>>> >>>>>>>>>>> The high-level summary for the SPIP is that it aims to >>>>>>>>>>> introduce a simple API in Python for Data Sources. The idea is to >>>>>>>>>>> enable >>>>>>>>>>> Python developers to create data sources without learning Scala or >>>>>>>>>>> dealing >>>>>>>>>>> with the complexities of the current data source APIs. This would >>>>>>>>>>> make >>>>>>>>>>> Spark more accessible to the wider Python developer community. >>>>>>>>>>> >>>>>>>>>>> References: >>>>>>>>>>> >>>>>>>>>>>- SPIP doc >>>>>>>>>>> >>>>>>>>>>> <https://docs.google.com/document/d/1oYrCKEKHzznljYfJO4kx5K_Npcgt1Slyfph3NEk7JRU/edit?usp=sharing> >>>>>>>>>>>- JIRA ticket >>>>>>>>>>><https://issues.apache.org/jira/browse/SPARK-44076> >>>>>>>>>>>- Discussion thread >>>>>>>>>>> >>>>>>>>>>> <https://lists.apache.org/thread/w621zn14ho4rw61b0s139klnqh900s8y> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Please vote on the SPIP for the next 72 hours: >>>>>>>>>>> >>>>>>>>>>> [ ] +1: Accept the proposal as an official SPIP >>>>>>>>>>> [ ] +0 >>>>>>>>>>> [ ] -1: I don’t think this is a good idea because __. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Allison >>>>>>>>>>> >>>>>>>>>> >>>> >>>> -- >>>> Twitter: https://twitter.com/holdenkarau >>>> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> >>> -- Takuya UESHIN

Re: Welcoming three new PMC members

2022-08-09 Thread Takuya UESHIN
add three new PMC members. Join me in >>>> welcoming them to their new roles! >>>> >>>> New PMC members: Huaxin Gao, Gengliang Wang and Maxim Gekk >>>> >>>> The Spark PMC >>>> >>> -- Takuya UESHIN

Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Takuya UESHIN
13, Hyukjin Kwon 님이 작성: >>>>> >>>>>> Hi all, >>>>>> >>>>>> The Spark PMC recently added Xinrong Meng as a committer on the >>>>>> project. Xinrong is the major contributor of PySpark especially Pandas >>>>>> API >>>>>> on Spark. She has guided a lot of new contributors enthusiastically. >>>>>> Please >>>>>> join me in welcoming Xinrong! >>>>>> >>>>>> -- Takuya UESHIN

Re: [VOTE] Release Spark 3.3.0 (RC3)

2022-05-27 Thread Takuya Ueshin
-1 I found a correctness issue of ArrayAggregate and the fix was merged after the RC3 cut. - https://issues.apache.org/jira/browse/SPARK-39293 - https://github.com/apache/spark/pull/36674 Thanks. On Tue, May 24, 2022 at 10:21 AM Maxim Gekk wrote: > Please vote on releasing the following

Re: PySpark Dynamic DataFrame for easier inheritance

2021-12-29 Thread Takuya Ueshin
I'm afraid I'm also against the proposal so far. What's wrong with going with "1. Functions" and using transform which allows chaining functions? I was not sure what you mean by "manage the namespaces", though. def with_price(df, factor: float = 2.0): return df.withColumn("price",

Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-29 Thread Takuya UESHIN
AM Liang-Chi Hsieh > >>> >> wrote: > >>> >> +1 (non-binding) > >>> >> > >>> >> rxin wrote > >>> >>> +1. Would open up a huge persona for Spark. > >>> >>> > >>> >>> On Fri, Mar 26 2021 at 11:30 AM, Bryan Cutler < > >>> >> > >>> >>> cutlerb@ > >>> >> > >>> >>>> wrote: > >>> >>> > >>> >>>> > >>> >>>> +1 (non-binding) > >>> >>>> > >>> >>>> > >>> >>>> On Fri, Mar 26, 2021 at 9:49 AM Maciej < > >>> >> > >>> >>> mszymkiewicz@ > >>> >> > >>> >>>> wrote: > >>> >>>> > >>> >>>> > >>> >>>>> +1 (nonbinding) > >>> >> > >>> >> -- > >>> >> Sent from: > >>> >> http://apache-spark-developers-list.1001551.n3.nabble.com/ > >>> >> > >>> >> > >>> > - > >>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>> >> > >>> >> -- > >>> >> > >>> >> --- > >>> >> Takeshi Yamamuro > >>> > >>> - > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>> > >> -- > >> Twitter: https://twitter.com/holdenkarau > >> Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 > >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Takuya UESHIN

Re: Welcoming some new Apache Spark committers

2020-07-14 Thread Takuya UESHIN
;>> - Jungtaek Lim >>> - Dilip Biswal >>> >>> All three of them contributed to Spark 3.0 and we’re excited to have >>> them join the project. >>> >>> Matei and the Spark PMC >>> - >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> -- Takuya UESHIN

[OSS DIGEST] The major changes of Apache Spark from May 6 to May 19

2020-06-09 Thread Takuya Ueshin
Hi all, This is the bi-weekly Apache Spark digest from the Databricks OSS team. For each API/configuration/behavior change, there will be an *[API]* tag in the title. CORE

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Takuya UESHIN
ng about what the cost will be: >>>>>>>> >> >> >>>>>>>> >> >> Usage - an API that is actively used in many different >>>>>>>> places, is always very costly to break. While it is hard to know usage >>>>>>>> for >>>>>>>> sure, there are a bunch of ways that we can estimate: >>>>>>>> >> >> >>>>>>>> >> >> How long has the API been in Spark? >>>>>>>> >> >> >>>>>>>> >> >> Is the API common even for basic programs? >>>>>>>> >> >> >>>>>>>> >> >> How often do we see recent questions in JIRA or mailing lists? >>>>>>>> >> >> >>>>>>>> >> >> How often does it appear in StackOverflow or blogs? >>>>>>>> >> >> >>>>>>>> >> >> Behavior after the break - How will a program that works >>>>>>>> today, work after the break? The following are listed roughly in order >>>>>>>> of >>>>>>>> increasing severity: >>>>>>>> >> >> >>>>>>>> >> >> Will there be a compiler or linker error? >>>>>>>> >> >> >>>>>>>> >> >> Will there be a runtime exception? >>>>>>>> >> >> >>>>>>>> >> >> Will that exception happen after significant processing has >>>>>>>> been done? >>>>>>>> >> >> >>>>>>>> >> >> Will we silently return different answers? (very hard to >>>>>>>> debug, might not even notice!) >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> Cost of Maintaining an API >>>>>>>> >> >> >>>>>>>> >> >> Of course, the above does not mean that we will never break >>>>>>>> any APIs. We must also consider the cost both to the project and to our >>>>>>>> users of keeping the API in question. >>>>>>>> >> >> >>>>>>>> >> >> Project Costs - Every API we have needs to be tested and >>>>>>>> needs to keep working as other parts of the project changes. These >>>>>>>> costs >>>>>>>> are significantly exacerbated when external dependencies change (the >>>>>>>> JVM, >>>>>>>> Scala, etc). In some cases, while not completely technically >>>>>>>> infeasible, >>>>>>>> the cost of maintaining a particular API can become too high. >>>>>>>> >> >> >>>>>>>> >> >> User Costs - APIs also have a cognitive cost to users >>>>>>>> learning Spark or trying to understand Spark programs. This cost >>>>>>>> becomes >>>>>>>> even higher when the API in question has confusing or undefined >>>>>>>> semantics. >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> Alternatives to Breaking an API >>>>>>>> >> >> >>>>>>>> >> >> In cases where there is a "Bad API", but where the cost of >>>>>>>> removal is also high, there are alternatives that should be considered >>>>>>>> that >>>>>>>> do not hurt existing users but do address some of the maintenance >>>>>>>> costs. >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an >>>>>>>> important point. Anytime we are adding a new interface to Spark we >>>>>>>> should >>>>>>>> consider that we might be stuck with this API forever. Think deeply >>>>>>>> about >>>>>>>> how new APIs relate to existing ones, as well as how you expect them to >>>>>>>> evolve over time. >>>>>>>> >> >> >>>>>>>> >> >> Deprecation Warnings - All deprecation warnings should point >>>>>>>> to a clear alternative and should never just say that an API is >>>>>>>> deprecated. >>>>>>>> >> >> >>>>>>>> >> >> Updated Docs - Documentation should point to the "best" >>>>>>>> recommended way of performing a given task. In the cases where we >>>>>>>> maintain >>>>>>>> legacy documentation, we should clearly point to newer APIs and >>>>>>>> suggest to >>>>>>>> users the "right" way. >>>>>>>> >> >> >>>>>>>> >> >> Community Work - Many people learn Spark by reading blogs and >>>>>>>> other sites such as StackOverflow. However, many of these resources >>>>>>>> are out >>>>>>>> of date. Update them, to reduce the cost of eventually removing >>>>>>>> deprecated >>>>>>>> APIs. >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> - >>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>> >> >>>>>>>> >>>>>>>> >>>>>>>> - >>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> --- >>>>> Takeshi Yamamuro >>>>> >>>> >> >> -- >> <https://databricks.com/sparkaisummit/north-america> >> > -- Takuya UESHIN http://twitter.com/ueshin

Re: [DISCUSS] Remove sorting of fields in PySpark SQL Row construction

2019-11-07 Thread Takuya UESHIN
ache/spark/pull/20280 > >>> [2] https://www.python.org/dev/peps/pep-0468/ > >>> [3] https://issues.apache.org/jira/browse/SPARK-29748 > > > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-31 Thread Takuya UESHIN
;>>>>>>> >>>>>>>>>>> Hi everyone, >>>>>>>>>>> >>>>>>>>>>> While deprecation of Python 2 in 3.0.0 has been announced >>>>>>>>>>> <https://spark.apache.org/new

Re: Welcoming some new committers and PMC members

2019-09-09 Thread Takuya UESHIN
s including ML, SQL, and >> data sources, so it’s great to have them here. All the best, >> >> Matei and the Spark PMC >> >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.

Re: Welcome Jose Torres as a Spark committer

2019-01-29 Thread Takuya UESHIN
ak Yavuz >>>>>>> wrote: >>>>>>> >>>>>>>> Congrats Jose! >>>>>>>> >>>>>>>> On Tue, Jan 29, 2019 at 10:50 AM Xiao Li >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Congratulations! >>>>>>>>> >>>>>>>>> Xiao >>>>>>>>> >>>>>>>>> Shixiong Zhu 于2019年1月29日周二 上午10:48写道: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> The Apache Spark PMC recently added Jose Torres as a committer on >>>>>>>>>> the project. Jose has been a major contributor to Structured >>>>>>>>>> Streaming. >>>>>>>>>> Please join me in welcoming him! >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> >>>>>>>>>> Shixiong Zhu >>>>>>>>>> >>>>>>>>>> >>>>>> >>>>>> -- >>>>>> Shane Knapp >>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>>>> https://rise.cs.berkeley.edu >>>>>> >>>>>> >> >> -- >> --- >> Takeshi Yamamuro >> > -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: welcome a new batch of committers

2018-10-03 Thread Takuya UESHIN
rk SQL) >> > >> > Please join me in welcoming them! >> >> >> >> >> >> -- >> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: array_contains in package org.apache.spark.sql.functions

2018-06-14 Thread Takuya UESHIN
expr, Literal(value)) > } > } > > > It does a pattern matching to detect if value is of type Column. If yes, > it will use the .expr of the column, otherwise it will work as it used to. > > Any suggestion or opinion on the proposition? > > > Kind regards, > Chongguang LIU > > -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Takuya UESHIN
ributing across several areas of Spark for a while, focusing >> especially >> > on analyzer, optimizer in Spark SQL. Please join me in welcoming >> Zhenhua! >> > >> > Wenchen >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: Welcoming some new committers

2018-03-02 Thread Takuya UESHIN
to Kubernetes support and other parts of >>>>> Spark) >>>>> > - Seth Hendrickson (contributor to MLlib and PySpark) >>>>> > >>>>> > Please join me in welcoming Anirudh, Bryan, Cody, Erik, Matt and >>>>> Seth as committers! >>>>> > >>>>> > Matei >>>>> > >>>>> - >>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> > >>>>> >>>>> - >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >>>>> >>>> >>> > -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-22 Thread Takuya UESHIN
that impact compatibility should be >> worked on immediately. Everything else please retarget to 2.3.1 or 2.4.0 as >> appropriate. >> >> === >> Why is my bug not fixed? >> === >> >> In order to make timely releases, we will typically not hold the release >> unless the bug in question is a regression from 2.2.0. That being said, if >> there is something which is a regression from 2.2.0 and has not been >> correctly targeted please ping me or a committer to help target the issue >> (you can see the open issues listed as impacting Spark 2.3.0 at >> https://s.apache.org/WmoI). >> > > -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Takuya UESHIN
la you >>>>> can add the staging repository to your projects resolvers and test with >>>>> the >>>>> RC (make sure to clean up the artifact cache before/after so you don't end >>>>> up building with a out of date RC going forward). >>>>>

Re: [discuss][PySpark] Can we drop support old Pandas (<0.19.2) or what version should we support?

2017-11-15 Thread Takuya UESHIN
is less than I expected, I definitely support it. It should >> speed up those cool changes. >> >> >> On 14 Nov 2017 7:14 pm, "Takuya UESHIN" <ues...@happy-camper.st> wrote: >> >> Hi all, >> >> I'd like to raise a discussion about Pa

[discuss][PySpark] Can we drop support old Pandas (<0.19.2) or what version should we support?

2017-11-14 Thread Takuya UESHIN
ame from pandas DataFrame with Arrow - https://github.com/apache/spark/pull/19646 Any comments are welcome! Thanks. -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: Welcoming Tejas Patil as a Spark committer

2017-10-03 Thread Takuya UESHIN
ently added Tejas Patil as a committer on the >> project. Tejas has been contributing across several areas of Spark for >> a while, focusing especially on scalability issues and SQL. Please >> join me in welcoming Tejas! >> >> Matei >> >> -----

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-12 Thread Takuya UESHIN
t; > : > >> > +1 > >> > > >> > On Mon, Sep 11, 2017 at 5:47 PM, Sameer Agarwal > > > sameer@ > > > > >> wrote: > >> > +1 (non-binding) > >> > > >> > On Thu, Sep 7, 2017 at 9:10 PM, Bryan Cut

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-06 Thread Takuya UESHIN
orator name so that it >> could also be useable for other efficient vectorized format in the future? >> Or do we anticipate the decorator to be format specific and will have more >> in the future? >> >> -- >> *From:* Reynold Xin <r...@

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-01 Thread Takuya UESHIN
e current effort and we will > be adding those later? > > On Fri, Sep 1, 2017 at 8:01 AM Takuya UESHIN <ues...@happy-camper.st> > wrote: > >> Hi all, >> >> We've been discussing to support vectorized UDFs in Python and we almost >> got a consensus about the

[VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-01 Thread Takuya UESHIN
forward and implement the SPIP. +0: Don't really care. -1: I don't think this is a good idea because of the following technical reasons. Thanks! -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: Welcoming Saisai (Jerry) Shao as a committer

2017-08-28 Thread Takuya UESHIN
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > > > --------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-07 Thread Takuya UESHIN
ark PMC recently voted to add Hyukjin Kwon and Sameer Agarwal >>>>> as committers. Join me in congratulating both of them and thanking them >>>>> for >>>>> their contributions to the project! >>>>> > >>>>> > Matei >>>>> > -

Re: welcoming Takuya Ueshin as a new Apache Spark committer

2017-02-13 Thread Takuya UESHIN
Congrats! >> >> Kazuaki Ishizaki >> >> >> >> From:Reynold Xin <r...@databricks.com> >> To:"dev@spark.apache.org" <dev@spark.apache.org> >> Date:2017/02/14 04:18 >> Subject:welcoming Takuya Uesh

Re: [vote] Apache Spark 2.0.0-preview release (rc1)

2016-05-19 Thread Takuya UESHIN
gt; If you are a Spark user, you can help us test this release by taking an > existing Apache Spark workload and running on this candidate, then > reporting any regressions. > > -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin

Re: What is the correct Spark version of master/branch-1.0?

2014-06-04 Thread Takuya UESHIN
Thank you for your reply. I've sent pull requests. Thanks. 2014-06-05 3:16 GMT+09:00 Patrick Wendell pwend...@gmail.com: It should be 1.1-SNAPSHOT. Feel free to submit a PR to clean up any inconsistencies. On Tue, Jun 3, 2014 at 8:33 PM, Takuya UESHIN ues...@happy-camper.st wrote: Hi all

What is the correct Spark version of master/branch-1.0?

2014-06-03 Thread Takuya UESHIN
(d96794132e37cf57f8dd945b9d11f8adcfc30490): - pom.xml: 1.0.1-SNAPSHOT - SparkBuild.scala: 1.0.0 It should be 1.0.1-SNAPSHOT? Thanks. -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin