Re: Does GraphX accepting patches?

2025-09-09 Thread Russell Jurney
Yeah, GraphFrames ingesting GraphX sounds like a good idea. There are if I recall zero issues relating to GraphX in JIRA, so not a lot of demand for it there and it's already deprecated. To ask another question... Sem has been adding property graph support to GraphFrames. One way to bring Graphs t

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-09 Thread Yang Jie
+1 On 2025/09/10 02:32:29 Wenchen Fan wrote: > +1 > > On Wed, Sep 10, 2025 at 4:13 AM Mich Talebzadeh > wrote: > > > Agreed +1 > > Dr Mich Talebzadeh, > > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > > > >view my Linkedin profile > >

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-09 Thread Wenchen Fan
+1 On Wed, Sep 10, 2025 at 4:13 AM Mich Talebzadeh wrote: > Agreed +1 > Dr Mich Talebzadeh, > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > >view my Linkedin profile > > > > > > > On Tue, 9 Sept 2025 at 16:5

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-09 Thread Allison Wang
Yes, that’s right. It’s essentially just one markdown file to start with, and we can add more later for language or version specific files if needed. On Tue, Sep 9, 2025 at 4:32 PM Hyukjin Kwon wrote: > so it's basically adding one text file for llm, right? I think it's a good > idea. > > On Tue

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-09 Thread Hyukjin Kwon
so it's basically adding one text file for llm, right? I think it's a good idea. On Tue, 9 Sept 2025 at 10:22, Allison Wang wrote: > Hi all, > > I’d like to propose adding llms.txt files to the Spark documentation. > > As more users rely on AI-assisted tools and LLMs to learn, write Spark > code

Re: Can anyone please provide clue why data shuffle is trying to handle 5.1 TB shuffle block?

2025-09-09 Thread Asif Shahid
My thoughts: 1) If one of table involved in join is relatively small, and the plan is not creating BroadcastHashJoin, then force it to create BHJ by: a) Explicit hint b) increasing auto broadcast threshold property ( make sure you do not put it more than 4 - 6GB, as with 8GB exceeded you will get e

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-09 Thread Mich Talebzadeh
Agreed +1 Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile On Tue, 9 Sept 2025 at 16:51, Peter Toth wrote: > Hi dev list, > > Apache Spark 3.5.6 was released on Ma

Can anyone please provide clue why data shuffle is trying to handle 5.1 TB shuffle block?

2025-09-09 Thread Jason Jun
Hi there, We're joining very big datasets in 5 minutes bucket in on-prem k3 env. We have this situation very often, i think shuffle partition is corrupted as 5.1 TB didn't make sense at all. We're running spark ver 3.5.2 in on-prem kubernetes with Spark Operator. So I'd really like to know t

Re: Does GraphX accepting patches?

2025-09-09 Thread Mich Talebzadeh
Agreed. will be good HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile On Tue, 9 Sept 2025 at 14:38, Enrico Minack wrote: > Hi all, > > maybe this is the right

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-09 Thread Dongjoon Hyun
+1 Yes, it's a perfect timing to deliver Spark 3.5.7. Thank you for volunteering for it, Peter. Dongjoon. On 2025/09/09 15:49:36 Peter Toth wrote: > Hi dev list, > > Apache Spark 3.5.6 was released on May 29, 2025, so it's been more than 3 > months. > As far as I can see, we have ~40 unrelease

Re: SPARK-51166 Prepare Apache Spark 4.1.0 for November 2025

2025-09-09 Thread Dongjoon Hyun
Hi, Xiao. Apache Spark project has a world-wide community which is working on November and the community already decided to put more efforts via the monthly releases. Let me rephrase the community schedule. Apache Spark 4.1.0-preview1 (2025-09-02) Apache Spark 4.1.0-preview2 (2025-10-02) Apache