Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-09-19 Thread Jungtaek Lim
+1 (non-binding) Thanks for driving the release! On Fri, Sep 19, 2025 at 5:49 PM Max Gekk wrote: > +1 > > On Thu, Sep 18, 2025 at 7:09 PM Kousuke Saruta wrote: > >> +1 >> >> 2025年9月19日(金) 1:03 huaxin gao : >> >>> +1 >>> Thanks Peter for driving the release! >>> >>> Huaxin >>> >>> On Thu, Sep 18

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-19 Thread Shaoyun Chen
+1 SPARK-46941[1] also fixed an issue with incorrect results. 1. https://issues.apache.org/jira/browse/SPARK-46941 Yang Jie 于2025年9月10日周三 11:49写道: > > +1 > > On 2025/09/10 02:32:29 Wenchen Fan wrote: > > +1 > > > > On Wed, Sep 10, 2025 at 4:13 AM Mich Talebzadeh > > wrote: > > > > > Agreed +1

Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-09-19 Thread Max Gekk
+1 On Thu, Sep 18, 2025 at 7:09 PM Kousuke Saruta wrote: > +1 > > 2025年9月19日(金) 1:03 huaxin gao : > >> +1 >> Thanks Peter for driving the release! >> >> Huaxin >> >> On Thu, Sep 18, 2025 at 8:54 AM kazuyuki tanimura >> wrote: >> >>> +1 (non-binding) >>> >>> Kazu >>> >>> >>> On Sep 18, 2025, at

Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-09-18 Thread Wenchen Fan
+1 On Thu, Sep 18, 2025 at 7:13 AM wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.5.7. > > The vote is open until Sat, 20 Sep 2025 17:13:14 PDT and passes if a > majority +1 PMC votes are cast, with > a minimum of 3 +1 votes. > > [ ] +1 Release this package

Re: [VOTE] SPIP: Add llms.txt files to Spark Documentation

2025-09-18 Thread Wenchen Fan
+1 On Fri, Sep 19, 2025 at 8:29 AM Szehon Ho wrote: > +1 (non-binding) > > Thanks! > Szehon > > On Thu, Sep 18, 2025 at 4:46 PM Jungtaek Lim > wrote: > >> (I missed to clarify, my +1 is non-binding, just to make easier to count) >> >> On Fri, Sep 19, 2025 at 8:44 AM Jungtaek Lim < >> kabhwan.op

Re: [VOTE] SPIP: Add llms.txt files to Spark Documentation

2025-09-18 Thread Szehon Ho
+1 (non-binding) Thanks! Szehon On Thu, Sep 18, 2025 at 4:46 PM Jungtaek Lim wrote: > (I missed to clarify, my +1 is non-binding, just to make easier to count) > > On Fri, Sep 19, 2025 at 8:44 AM Jungtaek Lim > wrote: > >> +1 this sounds promising given the trends of reliance in AI. >> >> On W

Re: [VOTE] SPIP: Add llms.txt files to Spark Documentation

2025-09-18 Thread Jungtaek Lim
(I missed to clarify, my +1 is non-binding, just to make easier to count) On Fri, Sep 19, 2025 at 8:44 AM Jungtaek Lim wrote: > +1 this sounds promising given the trends of reliance in AI. > > On Wed, Sep 17, 2025 at 4:04 PM Kousuke Saruta wrote: > >> +1 >> >> 2025年9月17日(水) 15:17 Dongjoon Hyun

Re: [VOTE] SPIP: Add llms.txt files to Spark Documentation

2025-09-18 Thread Jungtaek Lim
+1 this sounds promising given the trends of reliance in AI. On Wed, Sep 17, 2025 at 4:04 PM Kousuke Saruta wrote: > +1 > > 2025年9月17日(水) 15:17 Dongjoon Hyun : > >> +1 >> >> Dongjoon >> >> On 2025/09/16 02:30:33 Jules Damji wrote: >> > + 1 (non-binding) >> > — >> > Sent from my iPhone >> > Pardo

Re: My remote shuffle server is not working on AQE.

2025-09-18 Thread Asif Shahid
If the query is returning empty result , without your shuffle service and aqe turned on, then the issue could be 1) Bug in AQE or 2) Wrong stats, resulting in empty relation optimization kicking in. so check the stats.. run analyze table command before querying, If it happens only with your cust

Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-09-18 Thread Kousuke Saruta
+1 2025年9月19日(金) 1:03 huaxin gao : > +1 > Thanks Peter for driving the release! > > Huaxin > > On Thu, Sep 18, 2025 at 8:54 AM kazuyuki tanimura > wrote: > >> +1 (non-binding) >> >> Kazu >> >> >> On Sep 18, 2025, at 8:43 AM, Dongjoon Hyun wrote: >> >> +1 >> >> Thank you for leading Apache Spark

Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-09-18 Thread huaxin gao
+1 Thanks Peter for driving the release! Huaxin On Thu, Sep 18, 2025 at 8:54 AM kazuyuki tanimura wrote: > +1 (non-binding) > > Kazu > > > On Sep 18, 2025, at 8:43 AM, Dongjoon Hyun wrote: > > +1 > > Thank you for leading Apache Spark 3.5.7 release, Peter. > > Dongjoon > > On 2025/09/18 04:41:

Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-09-18 Thread kazuyuki tanimura
+1 (non-binding) Kazu > On Sep 18, 2025, at 8:43 AM, Dongjoon Hyun wrote: > > +1 > > Thank you for leading Apache Spark 3.5.7 release, Peter. > > Dongjoon > > On 2025/09/18 04:41:46 Zhou Jiang wrote: >> + 1 >> >> >>> On Sep 17, 2025, at 16:14, pt...@apache.org wrote: >>> >>> Please vote

Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-09-18 Thread Dongjoon Hyun
+1 Thank you for leading Apache Spark 3.5.7 release, Peter. Dongjoon On 2025/09/18 04:41:46 Zhou Jiang wrote: > + 1 > > > > On Sep 17, 2025, at 16:14, pt...@apache.org wrote: > > > > Please vote on releasing the following candidate as Apache Spark version > > 3.5.7. > > > > The vote is ope

Re: [VOTE] SPIP: Add llms.txt files to Spark Documentation

2025-09-17 Thread Dongjoon Hyun
+1 Dongjoon On 2025/09/16 02:30:33 Jules Damji wrote: > + 1 (non-binding) > — > Sent from my iPhone > Pardon the dumb thumb typos :) > > > On Sep 15, 2025, at 3:26 PM, Allison Wang wrote: > > > >  > > Hi all, > > > > I would like to start a vote on the SPIP: Add llms.txt files to Spark > >

Re: [VOTE] SPIP: Add llms.txt files to Spark Documentation

2025-09-17 Thread Augusto Vivaldelli
+1 El lun, 15 de sept de 2025, 19:38, Gengliang Wang escribió: > +1 > > On Mon, Sep 15, 2025 at 3:25 PM Allison Wang > wrote: > >> Hi all, >> >> I would like to start a vote on the SPIP: Add llms.txt files to Spark >> Documentation >> >> Discussion thread: >> https://lists.apache.org/thread/7rn

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-17 Thread Kousuke Saruta
+1 2025年9月10日(水) 17:44 Max Gekk : > +1 > > On Wed, Sep 10, 2025 at 8:13 AM Shaoyun Chen wrote: > >> +1 >> >> SPARK-46941[1] also fixed an issue with incorrect results. >> >> 1. https://issues.apache.org/jira/browse/SPARK-46941 >> >> Yang Jie 于2025年9月10日周三 11:49写道: >> > >> > +1 >> > >> > On 2025

Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-09-17 Thread Zhou Jiang
+ 1 > On Sep 17, 2025, at 16:14, pt...@apache.org wrote: > > Please vote on releasing the following candidate as Apache Spark version > 3.5.7. > > The vote is open until Sat, 20 Sep 2025 17:13:14 PDT and passes if a majority > +1 PMC votes are cast, with > a minimum of 3 +1 votes. > > [ ] +

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-17 Thread huaxin gao
+1 Thanks Peter for volunteering! Huaxin On Tue, Sep 9, 2025 at 9:29 AM Dongjoon Hyun wrote: > +1 > > Yes, it's a perfect timing to deliver Spark 3.5.7. > > Thank you for volunteering for it, Peter. > > Dongjoon. > > On 2025/09/09 15:49:36 Peter Toth wrote: > > Hi dev list, > > > > Apache Spark

Re: Does GraphX accepting patches?

2025-09-17 Thread Sem
I'm fine with moving GraphX to GraphFrames and it looks like we almost reach a consensus about it in GraphFrames maintainers. Quick question: what should I put to NOTICE file of GraphFrames? Is it enough just to add the following: """ This project contains the code of Apache Spark GraphX Copyrigh

Re: [ANNOUNCE][3rd-Party] TypeScript Spark Connect client — ts-spark-connector

2025-09-17 Thread Augusto Vivaldelli
The PR is the following: https://github.com/apache/spark-website/pull/632 On Mon, 15 Sept 2025 at 17:53, Augusto Vivaldelli < augusto.a.vivalde...@gmail.com> wrote: > Hi all, > > I’d like to share a third-party project: **ts-spark-connector**, a > TypeScript client for Spark Connect that enables

Re: [VOTE] SPIP: Add llms.txt files to Spark Documentation

2025-09-17 Thread Kousuke Saruta
+1 2025年9月17日(水) 15:17 Dongjoon Hyun : > +1 > > Dongjoon > > On 2025/09/16 02:30:33 Jules Damji wrote: > > + 1 (non-binding) > > — > > Sent from my iPhone > > Pardon the dumb thumb typos :) > > > > > On Sep 15, 2025, at 3:26 PM, Allison Wang > wrote: > > > > > >  > > > Hi all, > > > > > > I wou

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-16 Thread Augusto Vivaldelli
gt;> *Date: *Monday, September 15, 2025 at 10:38 AM >> *To: *DB Tsai >> *Cc: *Peter Toth , "dev@spark.apache.org" < >> dev@spark.apache.org> >> *Subject: *RE: [EXTERNAL] [DISCUSS] Release Apache Spark 3.5.7 >> >> >> >> Thanks for t

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-16 Thread Chao Sun
+1 On Tue, Sep 16, 2025 at 9:28 AM Rozov, Vlad wrote: > +1 > > > > Thank you, > > > > Vlad > > > > *From: *John Zhuge > *Date: *Monday, September 15, 2025 at 10:38 AM > *To: *DB Tsai > *Cc: *Peter Toth , "dev@spark.apache.org" <

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-16 Thread Rozov, Vlad
+1 Thank you, Vlad From: John Zhuge Date: Monday, September 15, 2025 at 10:38 AM To: DB Tsai Cc: Peter Toth , "dev@spark.apache.org" Subject: RE: [EXTERNAL] [DISCUSS] Release Apache Spark 3.5.7 Thanks for the help! On Mon, Sep 15, 2025 at 5:44 AM DB Tsai mailto:dbt...@dbtsai.c

Re: [VOTE] SPIP: Add llms.txt files to Spark Documentation

2025-09-15 Thread Denny Lee
+1 (non-binding) On Mon, Sep 15, 2025 at 6:12 PM Hyukjin Kwon wrote: > +1 > > On Tue, 16 Sept 2025 at 09:53, Augusto Vivaldelli < > augusto.a.vivalde...@gmail.com> wrote: > >> +1 >> >> El lun, 15 de sept de 2025, 19:38, Gengliang Wang >> escribió: >> >>> +1 >>> >>> On Mon, Sep 15, 2025 at 3:25 

Re: [VOTE] SPIP: Add llms.txt files to Spark Documentation

2025-09-15 Thread Jules Damji
+ 1 (non-binding) — Sent from my iPhone Pardon the dumb thumb typos :) > On Sep 15, 2025, at 3:26 PM, Allison Wang wrote: > >  > Hi all, > > I would like to start a vote on the SPIP: Add llms.txt files to Spark > Documentation > > Discussion thread: > https://lists.apache.org/thread/7rnhn9

Re: [VOTE] SPIP: Add llms.txt files to Spark Documentation

2025-09-15 Thread Hyukjin Kwon
+1 On Tue, 16 Sept 2025 at 09:53, Augusto Vivaldelli < augusto.a.vivalde...@gmail.com> wrote: > +1 > > El lun, 15 de sept de 2025, 19:38, Gengliang Wang > escribió: > >> +1 >> >> On Mon, Sep 15, 2025 at 3:25 PM Allison Wang >> wrote: >> >>> Hi all, >>> >>> I would like to start a vote on the SP

Re: [VOTE] SPIP: Add llms.txt files to Spark Documentation

2025-09-15 Thread Gengliang Wang
+1 On Mon, Sep 15, 2025 at 3:25 PM Allison Wang wrote: > Hi all, > > I would like to start a vote on the SPIP: Add llms.txt files to Spark > Documentation > > Discussion thread: > https://lists.apache.org/thread/7rnhn9xfl4bgfg0p6mlwo55y5vmpb9f6 > SPIP: > https://docs.google.com/document/d/1tRYdN

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-15 Thread DB Tsai
+1 DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Sep 12, 2025, at 10:14 PM, Peter Toth wrote: > > Thank you all for the positive feedback! > > On Fri, Sep 12, 2025 at 7:26 AM Jungtaek Lim > wrote: >> +1 sounds like a plan. >> >> On Wed,

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-15 Thread John Zhuge
Thanks for the help! On Mon, Sep 15, 2025 at 5:44 AM DB Tsai wrote: > +1 > DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > > On Sep 12, 2025, at 10:14 PM, Peter Toth wrote: > > Thank you all for the positive feedback! > > On Fri, Sep 12, 2025 at 7:26 AM Jungtaek Lim > wrote: > >

Re: [DISCUSS][SPIP] JDBC Driver for Spark Connect

2025-09-13 Thread Fu Chen
Thanks for starting this proposal and sharing the PoC. I think introducing a JDBC driver for Spark Connect will be very helpful for easing the migration from Spark Thrift Server. I look forward to seeing this feature available soon. On 2025/09/08 08:44:07 Martin Grund wrote: > I'm supportive

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-12 Thread Peter Toth
Thank you all for the positive feedback! On Fri, Sep 12, 2025 at 7:26 AM Jungtaek Lim wrote: > +1 sounds like a plan. > > On Wed, Sep 10, 2025 at 6:02 PM Kousuke Saruta wrote: > >> +1 >> >> 2025年9月10日(水) 17:44 Max Gekk : >> >>> +1 >>> >>> On Wed, Sep 10, 2025 at 8:13 AM Shaoyun Chen wrote: >>>

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-12 Thread Jungtaek Lim
+1 sounds like a plan. On Wed, Sep 10, 2025 at 6:02 PM Kousuke Saruta wrote: > +1 > > 2025年9月10日(水) 17:44 Max Gekk : > >> +1 >> >> On Wed, Sep 10, 2025 at 8:13 AM Shaoyun Chen wrote: >> >>> +1 >>> >>> SPARK-46941[1] also fixed an issue with incorrect results. >>> >>> 1. https://issues.apache.or

Re: Does GraphX accepting patches?

2025-09-11 Thread Enrico Minack
Hi all, maybe this is the right moment to move GraphX into GraphFrames to maintain it there. Cheers, Enrico Am 09.09.25 um 13:17 schrieb Sem: Hello! Because of deprecation of GraphX in Spark 4.x I have a question. Working on performance improvements in GraphFrames that is using GraphX under

Re: [DISCUSS] Data Type framework

2025-09-11 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.yo

Re: [DISCUSS] Data Type framework

2025-09-11 Thread Dongjoon Hyun
Sounds like a great plan! Thank you. +1 for the refactoring. Dongjoon. On Thu, Sep 11, 2025 at 1:04 PM Max Gekk wrote: > Hello Dongjoon, > > > can we do this migration safely in a step-by-step manner over multiple > Apache Spark versions without blocking any Apache Spark releases? > > Sure, we

Re: SPARK-51166 Prepare Apache Spark 4.1.0 for November 2025

2025-09-11 Thread Dongjoon Hyun
Thank you all. Apache Spark website is officially updated in order to share Apache Spark 4.1 release plan. https://spark.apache.org/versioning-policy.html Dongjoon. On 2025/09/11 16:42:14 Peter Toth wrote: > Hi, > > Yeah, as we will have 3 preview releases out before the first RC, hopefully >

Re: [DISCUSS] Data Type framework

2025-09-11 Thread Max Gekk
Hello Dongjoon, > can we do this migration safely in a step-by-step manner over multiple Apache Spark versions without blocking any Apache Spark releases? Sure, we can start from the TIME type, and refactor the existing pattern mathings. After that I would support new features of TIME using the f

Re: SPARK-51166 Prepare Apache Spark 4.1.0 for November 2025

2025-09-11 Thread Peter Toth
Hi, Yeah, as we will have 3 preview releases out before the first RC, hopefully the RC period won't take that long. Best, Peter On Tue, Sep 9, 2025 at 7:35 AM Dongjoon Hyun wrote: > Hi, Xiao. > > Apache Spark project has a world-wide community which is working on > November and the community a

Re: [DISCUSS] Data Type framework

2025-09-11 Thread Dongjoon Hyun
Thank you for sharing the direction, Max. Since this is internal refactoring, can we do this migration safely in a step-by-step manner over multiple Apache Spark versions without blocking any Apache Spark releases? The proposed direction itself looks reasonable and doable for me. Thanks, Dongj

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-10 Thread Jules Damji
Yes, indeed, one or two LLM.txt index manifest wouldn’t hurt, especially if it facilitates LLM searches. Though not at standard yet, but it’s gaining attention: https://directory.llmstxt.cloud/Cheers Jules —Sent from my iPhonePardon the dumb thumb typos :)On Sep 10, 2025, at 4:11 PM, Hyukjin Kwon

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-10 Thread Dongjoon Hyun
Thank you, Allison and Hyukjin. IIUC, this proposal is not about a single file. SPIP already exposes multiple files which may increase our documentation and website size twice (or more in the worst case) because it's simply a duplication of the content. If we start to use AI tools to generate thes

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-10 Thread Hyukjin Kwon
I am +1 if we're sure that it's adding one or only a few files, On Thu, 11 Sept 2025 at 06:53, Denny Lee wrote: > While it is not standard per se, it is quickly becoming a common > approach. And as you noted per MCP site, they have the llms-full.txt, they > also have > https://modelcontextproto

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-10 Thread Bjørn Jørgensen
The protocol for this llms.txt is not a standard yet. "*To clarify, llms.txt is not meant to be a duplication of the full documentation.*" Some like the Model Context Protocol (MCP) site have their full web page in the llms page. h

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-10 Thread Allison Wang
Thanks Dongjoon for raising these concerns. I agree with your point that it’s worth making the lightweight manifest scope explicit in the SPIP so we have a systematic guarantee it stays small (under 10MB). To clarify, llms.txt is not meant to be a duplication of the full documentation. Instead, it

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-10 Thread Wenchen Fan
This should just be a llm-facing index page of Spark docs? Given the amount of APIs Spark provides today, I think this index page should be useful to humans as well. On Wed, Sep 10, 2025 at 10:46 PM Dongjoon Hyun wrote: > Thank you, Allison and Hyukjin. > > IIUC, this proposal is not about a sin

Re: [DISCUSS] Data Type framework

2025-09-10 Thread serge rielau . com
I think this is a great idea. There is a signifcant backlog of types which should be added: E.g TIMESTAMP(9), TIMESTAMP WITH TIME ZONE, TIME WITH TIMEZONE, some sort of big decimal to name a few). Making these more "plug and play" is goodness. +1 On Sep 10, 2025, at 1:22 PM, Max Gekk wrote: H

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-10 Thread Max Gekk
+1 On Wed, Sep 10, 2025 at 8:13 AM Shaoyun Chen wrote: > +1 > > SPARK-46941[1] also fixed an issue with incorrect results. > > 1. https://issues.apache.org/jira/browse/SPARK-46941 > > Yang Jie 于2025年9月10日周三 11:49写道: > > > > +1 > > > > On 2025/09/10 02:32:29 Wenchen Fan wrote: > > > +1 > > > > >

Re: Does GraphX accepting patches?

2025-09-09 Thread Russell Jurney
Yeah, GraphFrames ingesting GraphX sounds like a good idea. There are if I recall zero issues relating to GraphX in JIRA, so not a lot of demand for it there and it's already deprecated. To ask another question... Sem has been adding property graph support to GraphFrames. One way to bring Graphs t

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-09 Thread Yang Jie
+1 On 2025/09/10 02:32:29 Wenchen Fan wrote: > +1 > > On Wed, Sep 10, 2025 at 4:13 AM Mich Talebzadeh > wrote: > > > Agreed +1 > > Dr Mich Talebzadeh, > > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > > > >view my Linkedin profile > >

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-09 Thread Wenchen Fan
+1 On Wed, Sep 10, 2025 at 4:13 AM Mich Talebzadeh wrote: > Agreed +1 > Dr Mich Talebzadeh, > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > >view my Linkedin profile > > > > > > > On Tue, 9 Sept 2025 at 16:5

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-09 Thread Allison Wang
Yes, that’s right. It’s essentially just one markdown file to start with, and we can add more later for language or version specific files if needed. On Tue, Sep 9, 2025 at 4:32 PM Hyukjin Kwon wrote: > so it's basically adding one text file for llm, right? I think it's a good > idea. > > On Tue

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-09 Thread Hyukjin Kwon
so it's basically adding one text file for llm, right? I think it's a good idea. On Tue, 9 Sept 2025 at 10:22, Allison Wang wrote: > Hi all, > > I’d like to propose adding llms.txt files to the Spark documentation. > > As more users rely on AI-assisted tools and LLMs to learn, write Spark > code

Re: Can anyone please provide clue why data shuffle is trying to handle 5.1 TB shuffle block?

2025-09-09 Thread Asif Shahid
My thoughts: 1) If one of table involved in join is relatively small, and the plan is not creating BroadcastHashJoin, then force it to create BHJ by: a) Explicit hint b) increasing auto broadcast threshold property ( make sure you do not put it more than 4 - 6GB, as with 8GB exceeded you will get e

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-09 Thread Mich Talebzadeh
Agreed +1 Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile On Tue, 9 Sept 2025 at 16:51, Peter Toth wrote: > Hi dev list, > > Apache Spark 3.5.6 was released on Ma

Re: Does GraphX accepting patches?

2025-09-09 Thread Mich Talebzadeh
Agreed. will be good HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile On Tue, 9 Sept 2025 at 14:38, Enrico Minack wrote: > Hi all, > > maybe this is the right

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-09 Thread Dongjoon Hyun
+1 Yes, it's a perfect timing to deliver Spark 3.5.7. Thank you for volunteering for it, Peter. Dongjoon. On 2025/09/09 15:49:36 Peter Toth wrote: > Hi dev list, > > Apache Spark 3.5.6 was released on May 29, 2025, so it's been more than 3 > months. > As far as I can see, we have ~40 unrelease

Re: SPARK-51166 Prepare Apache Spark 4.1.0 for November 2025

2025-09-09 Thread Dongjoon Hyun
Hi, Xiao. Apache Spark project has a world-wide community which is working on November and the community already decided to put more efforts via the monthly releases. Let me rephrase the community schedule. Apache Spark 4.1.0-preview1 (2025-09-02) Apache Spark 4.1.0-preview2 (2025-10-02) Apache

Re: SPARK-51166 Prepare Apache Spark 4.1.0 for November 2025

2025-09-08 Thread Xiao Li
I have the same concerns as Holden regarding the release timeline. Would it make sense to shift our RC to January? Just to clarify, this isn’t an issue with the release manager. The challenge is more about the level of community involvement during the RC stage, and we’ll need stronger engagement f

Re: SPARK-51166 Prepare Apache Spark 4.1.0 for November 2025

2025-09-08 Thread Dongjoon Hyun
Thank you, Holden. Yes, it's true and I agree with all your comments. At this time, we are in a much better situation because we have Apache Spark 4.1.0-preview1 already. In addition, I expect Apache Spark 4.1.0-preview2 in October. So, the 4.1.0 release will be smoother than ever. I will volunt

Re: [External] [ANNOUNCE] Announcing Apache Spark 4.1.0-preview1

2025-09-08 Thread Ofir Manor
hi, couldn't find the release notes, not in the announcement and not here Index of /releases It is harder to test and experiment when not knowing what changed and what's new... Would be great if release notes could be added next round Just my two cents, Ofi

Re: [DISCUSS][SPIP] JDBC Driver for Spark Connect

2025-09-08 Thread Martin Grund
I'm supportive of the general idea! On Mon, Sep 8, 2025 at 5:47 AM Cheng Pan wrote: > Update: > > I got some questions/responses on the SPIP docs and GitHub PR, > looking forward to more feedback! > > I have discussed offline with Kent Yao, and he will shepherd this SPIP. > > Thanks, > Cheng Pan

Re: [DISCUSS][SPIP] JDBC Driver for Spark Connect

2025-09-07 Thread Cheng Pan
Update: I got some questions/responses on the SPIP docs and GitHub PR, looking forward to more feedback! I have discussed offline with Kent Yao, and he will shepherd this SPIP. Thanks, Cheng Pan > On Sep 4, 2025, at 14:16, Cheng Pan wrote: > > Hi all, > > I’d like to propose introducing a

Re: [ANNOUNCE] Apache Spark 4.0.1 released

2025-09-07 Thread Hyukjin Kwon
Yay! On Sun, 7 Sept 2025 at 13:54, Dongjoon Hyun wrote: > We are happy to announce the availability of Apache Spark 4.0.1! > > Spark 4.0.1 is the first maintenance release based on the branch-4.0 > maintenance branch of Spark. It contains many fixes including security and > correctness domains.

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-06 Thread Jules Damji
+1 (non-binding) —Sent from my iPhonePardon the dumb thumb typos :)On Sep 2, 2025, at 6:04 AM, Kent Yao wrote:+1在 2025年9月2日星期二,Peter Toth 写道:+1On Tue, Sep 2, 2025 at 11:49 AM Yang Jie wrote:+1 On 2025/09/02 08:17:17 Max Gekk wrote: > +1 > > On Tue,

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-06 Thread Dongjoon Hyun
>>> > >>> > >>> > >>> Thank you, > >>> > >>> > >>> > >>> Vlad > >>> > >>> > >>> > >>> *From: *Zhou Jiang > >>> *Date: *Tuesday, September 2, 2025

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-04 Thread Szehon Ho
(non-binding) >>> >>> >>> >>> Thank you, >>> >>> >>> >>> Vlad >>> >>> >>> >>> *From: *Zhou Jiang >>> *Date: *Tuesday, September 2, 2025 at 10:10 AM >>> *To: *Anish Shrigonde

Re: [DISCUSS] Enhance JSON Parsing to Support Standard Compliance

2025-09-04 Thread Wenchen Fan
Do we have a list of behaviors we want to change after enabling the new config? On Thu, Sep 4, 2025 at 5:38 PM Philo wrote: > Hi all, > > I am writing to initiate a discussion on enhancing Spark JSON parsing to > support standard compliance. > > ## Motivation > In the current version of Spark, t

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-03 Thread Peter Toth
+1 On Tue, Sep 2, 2025 at 11:49 AM Yang Jie wrote: > +1 > > On 2025/09/02 08:17:17 Max Gekk wrote: > > +1 > > > > On Tue, Sep 2, 2025 at 7:48 AM wrote: > > > > > Please vote on releasing the following candidate as Apache Spark > version > > > 4.0.1. > > > > > > The vote is open until Fri, 05 Se

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-03 Thread huaxin gao
+1 On Tue, Sep 2, 2025 at 8:38 AM Dongjoon Hyun wrote: > +1 > > Dongjoon > > On 2025/09/02 15:23:55 "L. C. Hsieh" wrote: > > +1 > > > > On Tue, Sep 2, 2025 at 6:08 AM Wenchen Fan wrote: > > > > > > +1 > > > > > > On Tue, Sep 2, 2025 at 1:48 PM wrote: > > >> > > >> Please vote on releasing the

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-03 Thread Holden Karau
< > dongj...@apache.org>, "dev@spark.apache.org" > *Subject: *RE: [EXTERNAL] [VOTE] Release Spark 4.0.1 (RC1) > > > > +1 (non-binding) > > > > On Tue, Sep 2, 2025 at 10:07 AM Anish Shrigondekar > wrote: > > +1 > > > > Thanks, > &g

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Jungtaek Lim
Vlad > wrote: > >> +1 (non-binding) >> >> >> >> Thank you, >> >> >> >> Vlad >> >> >> >> *From: *Zhou Jiang >> *Date: *Tuesday, September 2, 2025 at 10:10 AM >> *To: *Anish Shrigondekar >> *Cc: *

Re: [ANNOUNCE] Announcing Apache Spark 4.1.0-preview1

2025-09-02 Thread Dongjoon Hyun
Great! Thank you, Hyukjin. Dongjoon. On 2025/09/03 00:31:53 Hyukjin Kwon wrote: > Hi, all. > > To enable wide-scale community testing of the upcoming Spark 4.1.0 release, > the Apache Spark community has posted a Spark 4.1.0-preview1 release >

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Dongjoon Hyun
; >> +1 >> >> On Tue, Sep 2, 2025 at 11:56 AM Rozov, Vlad >> wrote: >> >>> +1 (non-binding) >>> >>> >>> >>> Thank you, >>> >>> >>> >>> Vlad >>> >>> >>> >>

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Max Gekk
+1 On Tue, Sep 2, 2025 at 7:48 AM wrote: > Please vote on releasing the following candidate as Apache Spark version > 4.0.1. > > The vote is open until Fri, 05 Sep 2025 22:47:52 PDT and passes if a > majority +1 PMC votes are cast, with > a minimum of 3 +1 votes. > > [ ] +1 Release this package

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Kent Yao
+1 在 2025年9月2日星期二,Peter Toth 写道: > +1 > > On Tue, Sep 2, 2025 at 11:49 AM Yang Jie wrote: > >> +1 >> >> On 2025/09/02 08:17:17 Max Gekk wrote: >> > +1 >> > >> > On Tue, Sep 2, 2025 at 7:48 AM wrote: >> > >> > > Please vote on releasing the following candidate as Apache Spark >> version >> > >

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Jafeer Ali
+1 (non-binding) On Tue, Sep 2, 2025 at 10:19 PM Prashant Singh wrote: > +1 (non-binding) > > Best, > Prashant Singh > > On Tue, Sep 2, 2025 at 6:08 AM John Zhuge wrote: > >> +1 (non-binding) >> >> John Zhuge >> >> >> On Tue, Sep 2, 2025 at 4:04 PM Kent Yao wrote: >> >>> +1 >>> >>> 在 2025年9月2日

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Zhou Jiang
+1 (non-binding) On Tue, Sep 2, 2025 at 10:07 AM Anish Shrigondekar wrote: > +1 > > Thanks, > Anish > > On Tue, Sep 2, 2025 at 8:42 AM huaxin gao wrote: > >> +1 >> >> On Tue, Sep 2, 2025 at 8:38 AM Dongjoon Hyun wrote: >> >>> +1 >>> >>> Dongjoon >>> >>> On 2025/09/02 15:23:55 "L. C. Hsieh" wro

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Yuming Wang
;> >> *From: *Zhou Jiang >> *Date: *Tuesday, September 2, 2025 at 10:10 AM >> *To: *Anish Shrigondekar >> *Cc: *huaxin gao , Dongjoon Hyun < >> dongj...@apache.org>, "dev@spark.apache.org" >> *Subject: *RE: [EXTERNAL] [VOTE] Release Spark 4.0.

Re: [Structured Streaming] SST file does not exist. Race condition corrupting state store

2025-09-02 Thread Pedro Miguel Duarte
>>> 1) Configurable padding + N manifests >>>> >>>> - Add two knobs (defaults shown): >>>> >>>> >>>>- stateStore.rocksdb.gc.paddingMs = 12 (HDFS: 60–120s; S3/GCS: >>>>120–300s) >>>>- sta

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread kazuyuki tanimura
t;> >> >> From: Zhou Jiang mailto:zhou.c.ji...@gmail.com>> >> Date: Tuesday, September 2, 2025 at 10:10 AM >> To: Anish Shrigondekar >> Cc: huaxin gao mailto:huaxin.ga...@gmail.com>>, >> Dongjoon Hyun mailto:dongj...@apache.org>>, &

Re: [Structured Streaming] SST file does not exist. Race condition corrupting state store

2025-09-02 Thread B. Micheal Okutubo
ts >>> >>> - Add two knobs (defaults shown): >>> >>> >>>- stateStore.rocksdb.gc.paddingMs = 12 (HDFS: 60–120s; S3/GCS: >>>120–300s) >>>- stateStore.rocksdb.gc.protectedVersions = 3 (union of last N >>>manifests) &g

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Rozov, Vlad
+1 (non-binding) Thank you, Vlad From: Zhou Jiang Date: Tuesday, September 2, 2025 at 10:10 AM To: Anish Shrigondekar Cc: huaxin gao , Dongjoon Hyun , "dev@spark.apache.org" Subject: RE: [EXTERNAL] [VOTE] Release Spark 4.0.1 (RC1) +1 (non-binding) On Tue, Sep 2, 2025 at 10:0

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Anish Shrigondekar
+1 Thanks, Anish On Tue, Sep 2, 2025 at 8:42 AM huaxin gao wrote: > +1 > > On Tue, Sep 2, 2025 at 8:38 AM Dongjoon Hyun wrote: > >> +1 >> >> Dongjoon >> >> On 2025/09/02 15:23:55 "L. C. Hsieh" wrote: >> > +1 >> > >> > On Tue, Sep 2, 2025 at 6:08 AM Wenchen Fan wrote: >> > > >> > > +1 >> > > >

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Dongjoon Hyun
+1 Dongjoon On 2025/09/02 15:23:55 "L. C. Hsieh" wrote: > +1 > > On Tue, Sep 2, 2025 at 6:08 AM Wenchen Fan wrote: > > > > +1 > > > > On Tue, Sep 2, 2025 at 1:48 PM wrote: > >> > >> Please vote on releasing the following candidate as Apache Spark version > >> 4.0.1. > >> > >> The vote is open

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Prashant Singh
+1 (non-binding) Best, Prashant Singh On Tue, Sep 2, 2025 at 6:08 AM John Zhuge wrote: > +1 (non-binding) > > John Zhuge > > > On Tue, Sep 2, 2025 at 4:04 PM Kent Yao wrote: > >> +1 >> >> 在 2025年9月2日星期二,Peter Toth 写道: >> >>> +1 >>> >>> On Tue, Sep 2, 2025 at 11:49 AM Yang Jie wrote: >>>

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Cheng Pan
+1 (non-binding) Env info: Hadoop 3.4.2, OpenJDK 17, Ubuntu focal arm64 I tested Spark on YARN mode with ESS enabled, and Spark Standalone mode, run some basic queries, everything looks good. Thanks, Cheng Pan > On Sep 2, 2025, at 13:47, dongj...@apache.org wrote: > > Please vote on releasin

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread L. C. Hsieh
+1 On Tue, Sep 2, 2025 at 6:08 AM Wenchen Fan wrote: > > +1 > > On Tue, Sep 2, 2025 at 1:48 PM wrote: >> >> Please vote on releasing the following candidate as Apache Spark version >> 4.0.1. >> >> The vote is open until Fri, 05 Sep 2025 22:47:52 PDT and passes if a >> majority +1 PMC votes are

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread John Zhuge
+1 (non-binding) John Zhuge On Tue, Sep 2, 2025 at 4:04 PM Kent Yao wrote: > +1 > > 在 2025年9月2日星期二,Peter Toth 写道: > >> +1 >> >> On Tue, Sep 2, 2025 at 11:49 AM Yang Jie wrote: >> >>> +1 >>> >>> On 2025/09/02 08:17:17 Max Gekk wrote: >>> > +1 >>> > >>> > On Tue, Sep 2, 2025 at 7:48 AM wrote:

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Kousuke Saruta
+1 2025年9月2日(火) 22:09 Wenchen Fan : > +1 > > On Tue, Sep 2, 2025 at 1:48 PM wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 4.0.1. >> >> The vote is open until Fri, 05 Sep 2025 22:47:52 PDT and passes if a >> majority +1 PMC votes are cast, with >> a minim

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Wenchen Fan
+1 On Tue, Sep 2, 2025 at 1:48 PM wrote: > Please vote on releasing the following candidate as Apache Spark version > 4.0.1. > > The vote is open until Fri, 05 Sep 2025 22:47:52 PDT and passes if a > majority +1 PMC votes are cast, with > a minimum of 3 +1 votes. > > [ ] +1 Release this package

Re: [VOTE] Release Spark 4.0.1 (RC1)

2025-09-02 Thread Yang Jie
+1 On 2025/09/02 08:17:17 Max Gekk wrote: > +1 > > On Tue, Sep 2, 2025 at 7:48 AM wrote: > > > Please vote on releasing the following candidate as Apache Spark version > > 4.0.1. > > > > The vote is open until Fri, 05 Sep 2025 22:47:52 PDT and passes if a > > majority +1 PMC votes are cast, wit

Re: [Structured Streaming] SST file does not exist. Race condition corrupting state store

2025-08-30 Thread Pedro Miguel Duarte
last N >>manifests) >> >> - Only delete candidates >> >> >>- if:mtime(candidate) + paddingMs < min(mtime(referenced)) (or < now >>- paddingMs) >> >> 2) Final recheck before delete >> >>- Just before deletion, re

Re: [Structured Streaming] SST file does not exist. Race condition corrupting state store

2025-08-29 Thread B. Micheal Okutubo
00s) >- stateStore.rocksdb.gc.protectedVersions = 3 (union of last N >manifests) > > - Only delete candidates > > >- if:mtime(candidate) + paddingMs < min(mtime(referenced)) (or < now >- paddingMs) > > 2) Final recheck before delete > >- J

Re: Apache Spark 4.0.1 ?

2025-08-28 Thread Dongjoon Hyun
Thank you, Angel, Vlad, Jules, Jungtaek. Dongjoon. On 2025/08/28 05:34:20 Jungtaek Lim wrote: > +1 I was thinking of doing this since we had several major fixes on the new > API transformWithState, but forgot about it. Thanks for raising this! > > On Thu, Aug 28, 2025 at 6:36 AM Jules Damji wro

Re: Apache Spark 4.0.1 ?

2025-08-27 Thread Jungtaek Lim
+1 I was thinking of doing this since we had several major fixes on the new API transformWithState, but forgot about it. Thanks for raising this! On Thu, Aug 28, 2025 at 6:36 AM Jules Damji wrote: > +1 non-binding > — > Sent from my iPhone > Pardon the dumb thumb typos :) > > On Aug 26, 2025, a

Re: Apache Spark 4.0.1 ?

2025-08-27 Thread Jules Damji
+1 non-binding —Sent from my iPhonePardon the dumb thumb typos :)On Aug 26, 2025, at 9:23 AM, Ángel Álvarez Pascua wrote:+1. Thanks @Dongjoon Hyun El mar, 26 ago 2025, 18:20, Dongjoon Hyun escribió:Thank you, Bjorn, Cheng, Kent, Jie, Peter, Wenchen, Anish. I'm starting the

Re: Apache Spark 4.0.1 ?

2025-08-27 Thread Jules Damji
+1 non-binding —Sent from my iPhonePardon the dumb thumb typos :)On Aug 26, 2025, at 9:23 AM, Ángel Álvarez Pascua wrote:+1. Thanks @Dongjoon Hyun El mar, 26 ago 2025, 18:20, Dongjoon Hyun escribió:Thank you, Bjorn, Cheng, Kent, Jie, Peter, Wenchen, Anish. I'm starting the

Re: Why Snappy Compression?

2025-08-26 Thread Steve Loughran
w.r.t benchmarks, I'd look at "An Empirical Evaluation of Columnar Storage Formats (Extended Version)", https://arxiv.org/pdf/2304.05028 On Tue, 26 Aug 2025 at 21:45, Nimrod Ofek wrote: > Hi, > > From my experience, and from all the benchmarks I did and read- snappy > provides much bigger fil

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-08-26 Thread Daniel Tenedorio
Hi, I can help review. I helped review the original implementation of HLL sketch aggregate functions into Spark from Ryan Berti earlier. Sorry for not seeing this Spark mailing list thread earlier, I've been out on parental leave for a while (but back now). Best Daniel On 2025/06/04 23:20:16 "

  1   2   3   4   5   6   7   8   9   10   >