Re: [Reminder] Spark 3.5 RC Cut

2023-08-01 Thread Dongjoon Hyun
Hi, Emil. HADOOP-18568 is still open and it seems to be never a part of the Hadoop trunk branch. Do you mean another JIRA? Dongjoon. On Tue, Aug 1, 2023 at 2:59 AM Emil Ejbyfeldt wrote: > Hi, > > We previously ran some experiments on builds from the 3.5 branch and > noticed that Hadoop had

Re: [Reminder] Spark 3.5 RC Cut

2023-08-01 Thread Emil Ejbyfeldt
Hi, We previously ran some experiments on builds from the 3.5 branch and noticed that Hadoop had a regression (https://issues.apache.org/jira/browse/HADOOP-18568) in their s3a committer affecting 3.3.5 and 3.3.6 (Spark 3.4 uses hadoop 3.3.4). This fix has been merged into Hadoop and will be

Re: Time for Spark 3.3.3 release?

2023-07-31 Thread Ruifeng Zheng
+1, thank you Yuming On Tue, Aug 1, 2023 at 10:40 AM Yuming Wang wrote: > Thank you. I will prepare 3.3.3-rc1 soon. > > On Sun, Jul 30, 2023 at 12:15 AM Dongjoon Hyun > wrote: > >> +1 >> >> Thank you for volunteering, Yuming. >> >> Dongjoon >> >> >> On Fri, Jul 28, 2023 at 11:35 AM Yuming Wang

Re: Time for Spark 3.3.3 release?

2023-07-31 Thread Yuming Wang
Thank you. I will prepare 3.3.3-rc1 soon. On Sun, Jul 30, 2023 at 12:15 AM Dongjoon Hyun wrote: > +1 > > Thank you for volunteering, Yuming. > > Dongjoon > > > On Fri, Jul 28, 2023 at 11:35 AM Yuming Wang wrote: > >> Hi Spark devs, >> >> Since Apache Spark 3.3.2 tag creation (Feb 11), 60

Re: [VOTE] SPIP: XML data source support

2023-07-29 Thread Hyukjin Kwon
+1 On Sat, 29 Jul 2023 at 22:49, Maciej wrote: > +1 > > Best regards, > Maciej Szymkiewicz > > Web: https://zero323.net > PGP: A30CEF0C31A501EC > > On 7/29/23 11:28, Mich Talebzadeh wrote: > > +1 for me. > > Though Databriks did a good job releasing the code. > > GitHub - databricks/spark-xml:

Re: Time for Spark 3.3.3 release?

2023-07-29 Thread Dongjoon Hyun
+1 Thank you for volunteering, Yuming. Dongjoon On Fri, Jul 28, 2023 at 11:35 AM Yuming Wang wrote: > Hi Spark devs, > > Since Apache Spark 3.3.2 tag creation (Feb 11), 60 patches > have > arrived at branch-3.3. > > Shall we make

Re: [VOTE] SPIP: XML data source support

2023-07-29 Thread Maciej
+1 Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 7/29/23 11:28, Mich Talebzadeh wrote: +1 for me. Though Databriks did a good job releasing the code. GitHub - databricks/spark-xml: XML data source for Spark SQL and DataFrames

Re: [VOTE] SPIP: XML data source support

2023-07-29 Thread Mich Talebzadeh
+1 for me. Though Databriks did a good job releasing the code. GitHub - databricks/spark-xml: XML data source for Spark SQL and DataFrames Mich Talebzadeh, Solutions Architect/Engineering Lead Palantir

[Reminder] Spark 3.5 RC Cut

2023-07-29 Thread Yuanjian Li
Hi everyone, Following the release timeline, I will cut the RC on* Tuesday, Aug 1st at 1 pm PST* as scheduled. DateEvent July 17th 2023 Late July 2023 Code freeze. Release branch cut. QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged. August 2023

Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Jia Fan
+ 1 > 2023年7月29日 13:06,Adrian Pop-Tifrea 写道: > > +1, the more data source formats, the better, and if the solution is already > thoroughly tested, I say we should go for it. > > On Sat, Jul 29, 2023, 06:35 Xiao Li > wrote: >> +1 >> >> On Fri, Jul 28, 2023 at

Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Adrian Pop-Tifrea
+1, the more data source formats, the better, and if the solution is already thoroughly tested, I say we should go for it. On Sat, Jul 29, 2023, 06:35 Xiao Li wrote: > +1 > > On Fri, Jul 28, 2023 at 15:54 Sean Owen wrote: > >> +1 I think that porting the package 'as is' into Spark is probably

Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Xiao Li
+1 On Fri, Jul 28, 2023 at 15:54 Sean Owen wrote: > +1 I think that porting the package 'as is' into Spark is probably > worthwhile. > That's relatively easy; the code is already pretty battle-tested and not > that big and even originally came from Spark code, so is more or less > similar

Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Sean Owen
+1 I think that porting the package 'as is' into Spark is probably worthwhile. That's relatively easy; the code is already pretty battle-tested and not that big and even originally came from Spark code, so is more or less similar already. One thing it never got was DSv2 support, which means XML

[VOTE] SPIP: XML data source support

2023-07-28 Thread Sandip Agarwala
Dear Spark community, I would like to start the vote for "SPIP: XML data source support". XML is a widely used data format. An external spark-xml package ( https://github.com/databricks/spark-xml) is available to read and write XML data in spark. Making spark-xml built-in will provide a better

Re: Apache Arrow integration issue with Spark involving Netty

2023-07-28 Thread Dane Pitkin
Update! Netty has reverted the affecting change in v4.1.96. See netty commit here[1] and arrow PR to upgrade here[2]. The upcoming release of arrow-memory-netty v13 should work with netty versions <4.1.94 and >=4.1.96. [1]

Time for Spark 3.3.3 release?

2023-07-28 Thread Yuming Wang
Hi Spark devs, Since Apache Spark 3.3.2 tag creation (Feb 11), 60 patches have arrived at branch-3.3. Shall we make a new release, Apache Spark 3.3.3, as the third release at branch-3.3? I'd like to volunteer as the release manager

Re: Spark 3.0.0 EOL

2023-07-26 Thread Manu Zhang
Yes, I'm referring to this line. > The last minor release within a major a release will typically be maintained for longer as an “LTS” release Basically, I'm asking whether 3.5 will be the last 3.x release since we are already discussing 4.0. Thanks, Manu On Wed, Jul 26, 2023 at 7:35 PM Sean

Re: Spark 3.0.0 EOL

2023-07-26 Thread Sean Owen
There aren't "LTS" releases, though you might expect the last 3.x release will see maintenance releases longer. See end of https://spark.apache.org/versioning-policy.html On Wed, Jul 26, 2023 at 3:56 AM Manu Zhang wrote: > Will Apache Spark 3.5 be a LTS version? > > Thanks, > Manu > > On Mon,

Re: Spark 3.0.0 EOL

2023-07-26 Thread Manu Zhang
Will Apache Spark 3.5 be a LTS version? Thanks, Manu On Mon, Jul 24, 2023 at 4:26 PM Dongjoon Hyun wrote: > As Hyukjin replied, Apache Spark 3.0.0 is already in EOL status. > > To Pralabh, FYI, in the community, > > - Apache Spark 3.2 also reached the EOL already. >

Re: Spark 3.3 + parquet 1.10

2023-07-24 Thread Mich Talebzadeh
personally I have not done it myself. CCed to spark user group if some user has tried it among users. HTH Mich Talebzadeh, Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile

Re: Spark 3.3 + parquet 1.10

2023-07-24 Thread Pralabh Kumar
Spark3.3 in OSS built with parquet 1.12. Just compiling with parquet 1.10 results in build failure , so just wondering if any one have build & compiled Spark 3.3 with parquet 1.10. Regards Pralabh Kumar On Mon, Jul 24, 2023 at 3:04 PM Mich Talebzadeh wrote: > Hi, > > Where is this limitation

Re: Spark 3.3 + parquet 1.10

2023-07-24 Thread Mich Talebzadeh
Hi, Where is this limitation coming from (using 1.1.0)? That is 2018 build Have you tried Spark 3.3 with parquet writes as is? Just a small PoC will prove it. HTH Mich Talebzadeh, Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin

Spark 3.3 + parquet 1.10

2023-07-24 Thread Pralabh Kumar
Hi Dev community. I have a quick question with respect to Spark 3.3. Currently Spark 3.3 is built with parquet 1.12. However, anyone tried Spark 3.3 with parquet 1.10 . We are at Uber , planning to migrate Spark 3.3 but we have limitations of using parquet 1.10 . Has anyone tried building Spark

Re: Spark 3.0.0 EOL

2023-07-24 Thread Dongjoon Hyun
As Hyukjin replied, Apache Spark 3.0.0 is already in EOL status. To Pralabh, FYI, in the community, - Apache Spark 3.2 also reached the EOL already. https://lists.apache.org/thread/n4mdfwr5ksgpmrz0jpqp335qpvormos1 If you are considering Apache Spark 4, here is the other 3.x timeline, -

Re: Spark 3.0.0 EOL

2023-07-24 Thread Hyukjin Kwon
It's already EOL On Mon, Jul 24, 2023 at 4:17 PM Pralabh Kumar wrote: > Hi Dev Team > > If possible , can you please provide the Spark 3.0.0 EOL timelines . > > Regards > Pralabh Kumar > > > > >

Spark 3.0.0 EOL

2023-07-24 Thread Pralabh Kumar
Hi Dev Team If possible , can you please provide the Spark 3.0.0 EOL timelines . Regards Pralabh Kumar

Re: Spark 3.4.0 and 3.4.1 and Java version in Dockerfile

2023-07-23 Thread Mich Talebzadeh
Worth noting that these official dockerfiles https://hub.docker.com/_/spark were created with a valid java version *openjdk version "11.0.19" 2023-04-18* docker run -it* apache/spark:3.4.1-scala2.12-java11-r-ubuntu* /bin/bash ## downloaded from the above repository

Re: Spark 3.4.0 and 3.4.1 and Java version in Dockerfile

2023-07-22 Thread Mich Talebzadeh
Yes thanks, I know the answer. That was not what I was looking for. The provided script should be working one way or another which is not. good that someone has raised the issue already. that Jira was raised in Oct 2022 Mich Talebzadeh, Solutions Architect/Engineering Lead Palantir Technologies

Re: Spark 3.4.0 and 3.4.1 and Java version in Dockerfile

2023-07-22 Thread Bjørn Jørgensen
https://hub.docker.com/_/openjdk DEPRECATION NOTICE This image is officially deprecated and all users are recommended to find and use suitable replacements ASAP. Some examples of other Official Image alternatives (listed in alphabetical order with no intentional or implied preference): -

Spark 3.4.0 and 3.4.1 and Java version in Dockerfile

2023-07-22 Thread Mich Talebzadeh
Hi, I was checking the contents of Dockerfile for JAVA in Spark directory, .i.e ${SPARK_HOME}/kubernetes/dockerfiles/spark/Dockerfile in version 3.4.1 I recall that in 3.4.0, I made adjustment to Dockerfile content replacing #ARG java_image_tag=17-jre #FROM eclipse-temurin:${java_image_tag}

Re: Spark Docker Official Image is now available

2023-07-22 Thread Mich Talebzadeh
Hi, It helps if Spark binaries were added to PATH in the docker images. used to be there in previous versions like 3.1.3 etc docker run -it apache/spark:3.4.1-scala2.12-java11-r-ubuntu /bin/bash spark@e48cc28ff89e:/opt/spark/work-dir$ which spark-submit spark@e48cc28ff89e:/opt/spark/work-dir$

Re: Spark Docker Official Image is now available

2023-07-20 Thread Kent Yao
Thank you, Yikun! Kent Dongjoon Hyun 于2023年7月20日周四 19:25写道: > Thank you! > > Dongjoon > > On Thu, Jul 20, 2023 at 8:40 AM Xiao Li > wrote: > >> Thank you, Yikun! This is great! >> >> On Wed, Jul 19, 2023 at 7:55 PM Ruifeng Zheng wrote: >> >>> Awesome, thank you YiKun for driving this! >>>

Re: Spark Docker Official Image is now available

2023-07-20 Thread Dongjoon Hyun
Thank you! Dongjoon On Thu, Jul 20, 2023 at 8:40 AM Xiao Li wrote: > Thank you, Yikun! This is great! > > On Wed, Jul 19, 2023 at 7:55 PM Ruifeng Zheng wrote: > >> Awesome, thank you YiKun for driving this! >> >> On Thu, Jul 20, 2023 at 9:12 AM Hyukjin Kwon >> wrote: >> >>> This is amazing,

Re: Spark Docker Official Image is now available

2023-07-20 Thread Xiao Li
Thank you, Yikun! This is great! On Wed, Jul 19, 2023 at 7:55 PM Ruifeng Zheng wrote: > Awesome, thank you YiKun for driving this! > > On Thu, Jul 20, 2023 at 9:12 AM Hyukjin Kwon wrote: > >> This is amazing, finally! >> >> On Thu, 20 Jul 2023 at 10:10, Yikun Jiang wrote: >> >>> The spark

Re: Spark Docker Official Image is now available

2023-07-19 Thread Ruifeng Zheng
Awesome, thank you YiKun for driving this! On Thu, Jul 20, 2023 at 9:12 AM Hyukjin Kwon wrote: > This is amazing, finally! > > On Thu, 20 Jul 2023 at 10:10, Yikun Jiang wrote: > >> The spark Docker Official Image is now available: >> https://hub.docker.com/_/spark >> >> $ docker run -it --rm

Re: Spark Docker Official Image is now available

2023-07-19 Thread Hyukjin Kwon
This is amazing, finally! On Thu, 20 Jul 2023 at 10:10, Yikun Jiang wrote: > The spark Docker Official Image is now available: > https://hub.docker.com/_/spark > > $ docker run -it --rm *spark* /opt/spark/bin/spark-shell > $ docker run -it --rm *spark*:python3 /opt/spark/bin/pyspark > $ docker

Spark Docker Official Image is now available

2023-07-19 Thread Yikun Jiang
The spark Docker Official Image is now available: https://hub.docker.com/_/spark $ docker run -it --rm *spark* /opt/spark/bin/spark-shell $ docker run -it --rm *spark*:python3 /opt/spark/bin/pyspark $ docker run -it --rm *spark*:r /opt/spark/bin/sparkR We had a longer review journey than we

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Maciej
That's a great idea, as long as we can keep additional dependencies under control. Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 7/19/23 18:22, Franco Patano wrote: +1 Many people have struggled with incorporating this separate library into their Spark

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Franco Patano
+1 Many people have struggled with incorporating this separate library into their Spark pipelines. On Wed, Jul 19, 2023 at 10:53 AM Burak Yavuz wrote: > +1 on adding to Spark. Community involvement will make the XML reader > better. > > Best, > Burak > > On Wed, Jul 19, 2023 at 3:25 AM Martin

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Burak Yavuz
+1 on adding to Spark. Community involvement will make the XML reader better. Best, Burak On Wed, Jul 19, 2023 at 3:25 AM Martin Andersson wrote: > Alright, makes sense to add it then. > -- > *From:* Hyukjin Kwon > *Sent:* Wednesday, July 19, 2023 11:01 > *To:*

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Martin Andersson
Alright, makes sense to add it then. From: Hyukjin Kwon Sent: Wednesday, July 19, 2023 11:01 To: Martin Andersson Cc: Sandip Agarwala ; dev@spark.apache.org Subject: Re: [DISCUSS] SPIP: XML data source support EXTERNAL SENDER. Do not click links or open

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Hyukjin Kwon
Here are the benefits of having it as a built-in source: - We can leverage the community to improve the Spark XML (not within Databricks repositories). - We can share the same core for XML expressions (e.g., from_xml and to_xml like from_csv, from_json, etc.). - It is more to

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Martin Andersson
How much of an effort is it to use the spark-xml library today? What's the drawback to keeping this as an external library as-is? Best Regards, Martin From: Hyukjin Kwon Sent: Wednesday, July 19, 2023 01:27 To: Sandip Agarwala Cc: dev@spark.apache.org Subject:

Re: [DISCUSS] SPIP: XML data source support

2023-07-18 Thread Hyukjin Kwon
Yeah I support this. XML is pretty outdated format TBH but still used in many legacy systems. For example, Wikipedia dump is one case. Even when you take a look from stats CVS vs XML vs JSON, some show that XML is more used in CSV. On Wed, Jul 19, 2023 at 12:58 AM Sandip Agarwala <

Re: Spark Scala SBT Local build fails

2023-07-18 Thread Varun Shah
++ DEV community On Mon, Jul 17, 2023 at 4:14 PM Varun Shah wrote: > Resending this message with a proper Subject line > > Hi Spark Community, > > I am trying to set up my forked apache/spark project locally for my 1st > Open Source Contribution, by building and creating a package as mentioned

Re: Spark 3.5 Branch Cut

2023-07-17 Thread Yuanjian Li
Further reminder for the release timeline: DateEvent July 17th 2023 Code freeze. Release branch cut. Late July 2023 QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged. August 2023 Release candidates (RC), voting, etc. until final release passes Please

Re: Spark 3.5 Branch Cut

2023-07-17 Thread Raghu Angadi
Thanks Yuanjian for accepting these for warmfix. Raghu. On Mon, Jul 17, 2023 at 1:04 PM Yuanjian Li wrote: > Hi, all > > FYI, I cut branch-3.5 as https://github.com/apache/spark/tree/branch-3.5 > > Here is the complete list of exception merge requests received before the > cut: > >- > >

Re: Spark 3.5 Branch Cut

2023-07-17 Thread Dongjoon Hyun
Thank you so much, Yuanjian! Dongjoon. On Mon, Jul 17, 2023 at 1:05 PM Yuanjian Li wrote: > Hi, all > > FYI, I cut branch-3.5 as https://github.com/apache/spark/tree/branch-3.5 > > Here is the complete list of exception merge requests received before the > cut: > >- > >SPARK-44421:

Spark 3.5 Branch Cut

2023-07-17 Thread Yuanjian Li
Hi, all FYI, I cut branch-3.5 as https://github.com/apache/spark/tree/branch-3.5 Here is the complete list of exception merge requests received before the cut: - SPARK-44421: Reattach to existing execute in Spark Connect (server mechanism) - SPARK-44423: Reattach to existing

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-16 Thread Herman van Hovell
Hi Yuanjian, For the ongoing encoder work for the connect scala client I'd like to get the following tickets in: - SPARK-44396 : Direct Arrow Deserialization - SPARK-9 :

Re: Data Contracts

2023-07-16 Thread Phillip Henry
No worries. Have you had a chance to look at it? Since this thread has gone dead, I assume there is no appetite for adding data contract functionality..? Regards, Phillip On Mon, 19 Jun 2023, 11:23 Deepak Sharma, wrote: > Sorry for using simple in my last email . > It’s not gonna to be

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-15 Thread Enrico Minack
Speaking of JdbcDialect, is there any interest in getting upserts for JDBC into 3.5.0? [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC: https://github.com/apache/spark/pull/41518 [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC using MERGE INTO with temp table:

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-14 Thread Jia Fan
Can we put [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect into 3.5.0? https://github.com/apache/spark/pull/41855 Since this is the last major version update of 3.x, I think we need to make sure JdbcDialect can support more databases. Gengliang Wang 于2023年7月15日周六

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-14 Thread Gengliang Wang
Hi Yuanjian, Besides the abovementioned changes, it would be great to include the UI page for Spakr Connect: SPARK-44394 . Best Regards, Gengliang On Fri, Jul 14, 2023 at 11:44 AM Julek Sompolski wrote: > Thank you, > My changes that you

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-14 Thread Julek Sompolski
Thank you, My changes that you listed are tracked under this Epic: https://issues.apache.org/jira/browse/SPARK-43754 I am also working on https://issues.apache.org/jira/browse/SPARK-44422, didn't mention it before because I have hopes that this one will make it before the cut. (Unrelated) My

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-14 Thread Raghu Angadi
Thank you. We plan to get remaining major pieces for Streaming Spark Connect (Epic SPARK-42938 ). I would like to request a warmfix exception for the following tweaks and improvements over the next two weeks (all in the same epic). -

Re: Time for Spark v3.5.0 release

2023-07-14 Thread Yuanjian Li
Thanks for raising all the requests. Let's stick to the previously agreed branch cut time. Based on past practice, let's label the above requests as exception features. I have just sent out a branch cut reminder titled "[Reminder] Spark 3.5 Branch Cut." Please ensure that all your requests are

[Reminder] Spark 3.5 Branch Cut

2023-07-14 Thread Yuanjian Li
Hi everyone, As discussed earlier in "Time for Spark v3.5.0 release", I will cut branch-3.5 on *Monday, July 17th at 1 pm PST* as scheduled. Please plan your PR merge accordingly with the given timeline. Currently, we have received the following exception merge requests: - SPARK-44421:

Re: Time for Spark v3.5.0 release

2023-07-14 Thread Julek Sompolski
I am working on SPARK-44421, SPARK-44423 and SPARK-44424 in Spark Connect to support execution reconnection. A week or two of warmfix grace period would be much appreciated for this work. Best regards, Juliusz Sompolski On Fri, Jul 14, 2023 at 5:40 PM Raghu Angadi wrote: > We have a bunch of

Re: Time for Spark v3.5.0 release

2023-07-14 Thread Raghu Angadi
We have a bunch of work in progress for Spark Connect trying to meet the branch cut deadline. Moving to 17th is certainly welcome. Is it feasible to extend it by a couple of more days? Alternatively, we could have a relaxed warmfix process for Spark Connect code for a week or two since it does

Unsubscribe

2023-07-13 Thread Dumas Hwang

unsubscribe

2023-07-13 Thread Raffael Bottoli Schemmer
unsubscribe

Re: Apache Arrow integration issue with Spark involving Netty

2023-07-13 Thread Dane Pitkin
I just want to add that there is a Spark Jira issue[1] for upgrading Netty once Arrow v13.0.0 is released this month. [1] https://issues.apache.org/jira/projects/SPARK/issues/SPARK-44212 On Thu, Jul 6, 2023 at 2:25 PM Dane Pitkin wrote: > Hi all, > > The next release of Apache Arrow v13.0.0

Re: [VOTE][RESULT] Python Data Source API

2023-07-11 Thread Mich Talebzadeh
Hi Allison, Great job and thanks for your efforts in driving this. Looking forward to seeing it in action soon! Best Mich Talebzadeh, Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile

[VOTE][RESULT] Python Data Source API

2023-07-10 Thread Allison Wang
The vote passes with 12 +1s (8 binding +1s) and one +0 (binding). (* = binding) +1: - Hyukjin Kwon * - Xiao Li * - Denny Lee - Martin Grund - Mich Talebzadeh - Huaxin Gao * - Holden Karau * - Reynold Xin * - Jungtaek Lim - Ruifeng Zheng * - Takuya Ueshin * - Matei Zaharia * +0: Maciej

Re: [VOTE][SPIP] Python Data Source API

2023-07-10 Thread Jungtaek Lim
Just to be fully sure, SPIP does not cover streaming, but if the performance is not great compared to the JVM based implementation in any way (which I expect so), I don't think it's good to integrate with streaming which targets lower latency. That's the reason I gave +1 although it's not covering

Re: [VOTE][SPIP] Python Data Source API

2023-07-10 Thread Matei Zaharia
+1 > On Jul 10, 2023, at 10:19 AM, Takuya UESHIN wrote: > > +1 > > On Sun, Jul 9, 2023 at 10:05 PM Ruifeng Zheng > wrote: >> +1 >> >> On Mon, Jul 10, 2023 at 8:20 AM Jungtaek Lim > > wrote: >>> +1 >>> >>> On Sat, Jul 8, 2023

Re: [VOTE][SPIP] Python Data Source API

2023-07-10 Thread Takuya UESHIN
+1 On Sun, Jul 9, 2023 at 10:05 PM Ruifeng Zheng wrote: > +1 > > On Mon, Jul 10, 2023 at 8:20 AM Jungtaek Lim > wrote: > >> +1 >> >> On Sat, Jul 8, 2023 at 4:13 AM Reynold Xin >> wrote: >> >>> +1! >>> >>> >>> On Fri, Jul 7 2023 at 11:58 AM, Holden Karau >>> wrote: >>> +1 On

Unsubscribe

2023-07-10 Thread Bode, Meikel
Unsubscribe

Re: [VOTE][SPIP] Python Data Source API

2023-07-09 Thread Ruifeng Zheng
+1 On Mon, Jul 10, 2023 at 8:20 AM Jungtaek Lim wrote: > +1 > > On Sat, Jul 8, 2023 at 4:13 AM Reynold Xin > wrote: > >> +1! >> >> >> On Fri, Jul 7 2023 at 11:58 AM, Holden Karau >> wrote: >> >>> +1 >>> >>> On Fri, Jul 7, 2023 at 9:55 AM huaxin gao >>> wrote: >>> +1 On Fri,

Re: [VOTE][SPIP] Python Data Source API

2023-07-09 Thread Jungtaek Lim
+1 On Sat, Jul 8, 2023 at 4:13 AM Reynold Xin wrote: > +1! > > > On Fri, Jul 7 2023 at 11:58 AM, Holden Karau > wrote: > >> +1 >> >> On Fri, Jul 7, 2023 at 9:55 AM huaxin gao wrote: >> >>> +1 >>> >>> On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>>

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread Reynold Xin
+1! On Fri, Jul 7 2023 at 11:58 AM, Holden Karau < hol...@pigscanfly.ca > wrote: > > +1 > > > On Fri, Jul 7, 2023 at 9:55 AM huaxin gao < huaxin.ga...@gmail.com > wrote: > > > >> +1 >> >> >> On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh < mich.talebza...@gmail.com >> > wrote: >> >>

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread Holden Karau
+1 On Fri, Jul 7, 2023 at 9:55 AM huaxin gao wrote: > +1 > > On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh > wrote: > >> +1 for me >> >> Mich Talebzadeh, >> Solutions Architect/Engineering Lead >> Palantir Technologies Limited >> London >> United Kingdom >> >> >>view my Linkedin profile

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread huaxin gao
+1 On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh wrote: > +1 for me > > Mich Talebzadeh, > Solutions Architect/Engineering Lead > Palantir Technologies Limited > London > United Kingdom > > >view my Linkedin profile > > > >

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread Mich Talebzadeh
+1 for me Mich Talebzadeh, Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk.

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread Martin Grund
+1 (non-binding) On Fri, Jul 7, 2023 at 12:05 AM Denny Lee wrote: > +1 (non-binding) > > On Fri, Jul 7, 2023 at 00:50 Maciej wrote: > >> +0 >> >> Best regards, >> Maciej Szymkiewicz >> >> Web: https://zero323.net >> PGP: A30CEF0C31A501EC >> >> On 7/6/23 17:41, Xiao Li wrote: >> >> +1 >> >>

Re: [VOTE][SPIP] Python Data Source API

2023-07-06 Thread Denny Lee
+1 (non-binding) On Fri, Jul 7, 2023 at 00:50 Maciej wrote: > +0 > > Best regards, > Maciej Szymkiewicz > > Web: https://zero323.net > PGP: A30CEF0C31A501EC > > On 7/6/23 17:41, Xiao Li wrote: > > +1 > > Xiao > > Hyukjin Kwon 于2023年7月5日周三 17:28写道: > >> +1. >> >> See

Apache Arrow integration issue with Spark involving Netty

2023-07-06 Thread Dane Pitkin
Hi all, The next release of Apache Arrow v13.0.0 coming this month[1] has upgraded Netty to v4.1.94.Final[2] due to a moderate severity CVE[3]. We are seeing that Spark using Netty v4.1.93.Final is not compatible with Arrow v13.0.0, throwing an exception at runtime[4]. There has been some talk in

Re: [VOTE][SPIP] Python Data Source API

2023-07-06 Thread Maciej
+0 Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 7/6/23 17:41, Xiao Li wrote: +1 Xiao Hyukjin Kwon 于2023年7月5日周三 17:28写道: +1. See https://youtu.be/yj7XlTB1Jvc?t=604 :-). On Thu, 6 Jul 2023 at 09:15, Allison Wang wrote: Hi all,

Re: [VOTE][SPIP] Python Data Source API

2023-07-06 Thread Xiao Li
+1 Xiao Hyukjin Kwon 于2023年7月5日周三 17:28写道: > +1. > > See https://youtu.be/yj7XlTB1Jvc?t=604 :-). > > On Thu, 6 Jul 2023 at 09:15, Allison Wang > wrote: > >> Hi all, >> >> I'd like to start the vote for SPIP: Python Data Source API. >> >> The high-level summary for the SPIP is that it aims to

Re: [VOTE][SPIP] Python Data Source API

2023-07-05 Thread Hyukjin Kwon
+1. See https://youtu.be/yj7XlTB1Jvc?t=604 :-). On Thu, 6 Jul 2023 at 09:15, Allison Wang wrote: > Hi all, > > I'd like to start the vote for SPIP: Python Data Source API. > > The high-level summary for the SPIP is that it aims to introduce a simple > API in Python for Data Sources. The idea

[VOTE][SPIP] Python Data Source API

2023-07-05 Thread Allison Wang
Hi all, I'd like to start the vote for SPIP: Python Data Source API. The high-level summary for the SPIP is that it aims to introduce a simple API in Python for Data Sources. The idea is to enable Python developers to create data sources without learning Scala or dealing with the complexities of

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Xinrong Meng
+1 Thank you! On Tue, Jul 4, 2023 at 3:04 PM Jungtaek Lim wrote: > +1 > > On Wed, Jul 5, 2023 at 2:23 AM L. C. Hsieh wrote: > >> +1 >> >> Thanks Yuanjian. >> >> On Tue, Jul 4, 2023 at 7:45 AM yangjie01 wrote: >> > >> > +1 >> > >> > >> > >> > 发件人: Maxim Gekk >> > 日期: 2023年7月4日 星期二 17:24 >> >

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Jungtaek Lim
+1 On Wed, Jul 5, 2023 at 2:23 AM L. C. Hsieh wrote: > +1 > > Thanks Yuanjian. > > On Tue, Jul 4, 2023 at 7:45 AM yangjie01 wrote: > > > > +1 > > > > > > > > 发件人: Maxim Gekk > > 日期: 2023年7月4日 星期二 17:24 > > 收件人: Kent Yao > > 抄送: "dev@spark.apache.org" > > 主题: Re: Time for Spark v3.5.0

Re: Time for Spark v3.5.0 release

2023-07-04 Thread L. C. Hsieh
+1 Thanks Yuanjian. On Tue, Jul 4, 2023 at 7:45 AM yangjie01 wrote: > > +1 > > > > 发件人: Maxim Gekk > 日期: 2023年7月4日 星期二 17:24 > 收件人: Kent Yao > 抄送: "dev@spark.apache.org" > 主题: Re: Time for Spark v3.5.0 release > > > > +1 > > On Tue, Jul 4, 2023 at 11:55 AM Kent Yao wrote: > > +1, thank you

Re: Time for Spark v3.5.0 release

2023-07-04 Thread yangjie01
+1 发件人: Maxim Gekk 日期: 2023年7月4日 星期二 17:24 收件人: Kent Yao 抄送: "dev@spark.apache.org" 主题: Re: Time for Spark v3.5.0 release +1 On Tue, Jul 4, 2023 at 11:55 AM Kent Yao mailto:y...@apache.org>> wrote: +1, thank you Kent On 2023/07/04 05:32:52 Dongjoon Hyun wrote: > +1 > > Thank you, Yuanjian

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Jia Fan
+1 Maxim Gekk 于2023年7月4日周二 17:23写道: > +1 > > On Tue, Jul 4, 2023 at 11:55 AM Kent Yao wrote: > >> +1, thank you >> >> Kent >> >> On 2023/07/04 05:32:52 Dongjoon Hyun wrote: >> > +1 >> > >> > Thank you, Yuanjian >> > >> > Dongjoon >> > >> > On Tue, Jul 4, 2023 at 1:03 AM Hyukjin Kwon >> wrote:

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Maxim Gekk
+1 On Tue, Jul 4, 2023 at 11:55 AM Kent Yao wrote: > +1, thank you > > Kent > > On 2023/07/04 05:32:52 Dongjoon Hyun wrote: > > +1 > > > > Thank you, Yuanjian > > > > Dongjoon > > > > On Tue, Jul 4, 2023 at 1:03 AM Hyukjin Kwon > wrote: > > > > > Yeah one day postponed shouldn't be a big deal.

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Kent Yao
+1, thank you Kent On 2023/07/04 05:32:52 Dongjoon Hyun wrote: > +1 > > Thank you, Yuanjian > > Dongjoon > > On Tue, Jul 4, 2023 at 1:03 AM Hyukjin Kwon wrote: > > > Yeah one day postponed shouldn't be a big deal. > > > > On Tue, Jul 4, 2023 at 7:10 AM Yuanjian Li wrote: > > > >> Hi All,

Re: Time for Spark v3.5.0 release

2023-07-03 Thread Dongjoon Hyun
+1 Thank you, Yuanjian Dongjoon On Tue, Jul 4, 2023 at 1:03 AM Hyukjin Kwon wrote: > Yeah one day postponed shouldn't be a big deal. > > On Tue, Jul 4, 2023 at 7:10 AM Yuanjian Li wrote: > >> Hi All, >> >> According to the Spark versioning policy at >>

Re: Introducing English SDK for Apache Spark - Seeking Your Feedback and Contributions

2023-07-03 Thread Hyukjin Kwon
The demo was really amazing. On Tue, 4 Jul 2023 at 09:17, Farshid Ashouri wrote: > This is wonderful news! > > On Tue, 4 Jul 2023 at 01:14, Gengliang Wang wrote: > >> Dear Apache Spark community, >> >> We are delighted to announce the launch of a groundbreaking tool that >> aims to make Apache

Introducing English SDK for Apache Spark - Seeking Your Feedback and Contributions

2023-07-03 Thread Gengliang Wang
Dear Apache Spark community, We are delighted to announce the launch of a groundbreaking tool that aims to make Apache Spark more user-friendly and accessible - the English SDK . Powered by the application of Generative AI, the English SDK

Re: Time for Spark v3.5.0 release

2023-07-03 Thread Hyukjin Kwon
Yeah one day postponed shouldn't be a big deal. On Tue, Jul 4, 2023 at 7:10 AM Yuanjian Li wrote: > Hi All, > > According to the Spark versioning policy at > https://spark.apache.org/versioning-policy.html, should we cut > *branch-3.5* on *July 17th, 2023*? (We initially proposed January 16th,

Time for Spark v3.5.0 release

2023-07-03 Thread Yuanjian Li
Hi All, According to the Spark versioning policy at https://spark.apache.org/versioning-policy.html, should we cut *branch-3.5* on *July 17th, 2023*? (We initially proposed January 16th, but since it's a Sunday, I suggest we postpone it by one day). I would like to volunteer as the release

Re: Beginner - Looking for starter issues

2023-06-29 Thread Jia Fan
Hi Harry, Maybe you can start with https://issues.apache.org/jira/browse/SPARK-37935 Jia Fan > 2023年6月28日 08:09,Harry 写道: > > Hi, > > I am looking to pick up some tasks on ASF Jira. > I have a basic understanding of how things work in the Spark code base.

Beginner - Looking for starter issues

2023-06-27 Thread Harry
Hi, I am looking to pick up some tasks on ASF Jira. I have a basic understanding of how things work in the Spark code base. So I am thinking if I can start with some simple tasks to get ramped up. I tried searching on JIRA open issues and there were many. It was confusing as some tasks are

Unsubscribe

2023-06-27 Thread Amogh Desai
Unsubscribe

[VOTE][RESULT] PySpark Test Framework

2023-06-26 Thread Amanda Liu
The vote passes with 10 +1s (nine binding +1s) and one +0. Thank you all for your participation and comments! (* = binding) +1: - Holden Karau (*) - Reynold Xin (*) - Mich Talebzadeh - Maciej Szymkiewicz (*) - Hyukjin Kwon (*) - Dongjoon Hyun (*) - Ruifeng Zheng (*) - Xinrong Meng (*) -

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-25 Thread Reynold Xin
Personally I'd love this, but I agree with some of the earlier comments that this should not be Python specific (meaning I should be able to implement a data source in Python and then make it usable across all languages Spark  supports). I think we should find a way to make this reusable beyond

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-25 Thread Maciej
Thanks for your feedback Martin. However, if the primary intended purpose of this API is to provide an interface for endpoint querying, then I find this proposal even less convincing. Neither the Spark execution model nor the data source API (full or restricted as proposed here) are a good

<    9   10   11   12   13   14   15   16   17   18   >