Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Haejoon Lee
+1 On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon wrote: > Hi all, > > I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark > Connect) > > JIRA > Prototype > SPIP doc >

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Haejoon Lee
+1 On Mon, Mar 11, 2024 at 10:36 AM Gengliang Wang wrote: > Hi all, > > I'd like to start the vote for SPIP: Structured Logging Framework for > Apache Spark > > References: > >- JIRA ticket >- SPIP doc > >

Re: First Time contribution.

2023-09-17 Thread Haejoon Lee
Welcome Ram! :-) I would recommend you to check https://issues.apache.org/jira/browse/SPARK-37935 out as a starter task. Refer to https://github.com/apache/spark/pull/41504, https://github.com/apache/spark/pull/41455 as an example PR. Or you can also add a new sub-task if you find any error

Re: LLM script for error message improvement

2023-08-03 Thread Haejoon Lee
e: > >> I think adding that dev tool script to improve the error message is fine. >> >> On Thu, 3 Aug 2023 at 10:24, Haejoon Lee >> wrote: >> >>> Dear contributors, I hope you are doing well! >>> >>> I see there are contributors who are i

LLM script for error message improvement

2023-08-02 Thread Haejoon Lee
Dear contributors, I hope you are doing well! I see there are contributors who are interested in working on error message improvements and persistent contribution, so I want to share an llm-based error message improvement script for helping your contribution. You can find a detail for the script

Re: [Question] Can't start Spark Connect

2023-03-08 Thread Haejoon Lee
Additionally, try deleting the `.idea` in the spark home directory and restarting IntelliJ if it does not work properly after re-building during development. The .idea stores IntelliJ's project configuration and settings, and is automatically generated when IntelliJ is launched. >

Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Haejoon Lee
Congrats, Xinrong!! On Tue, Aug 9, 2022 at 5:12 PM Hyukjin Kwon wrote: > Hi all, > > The Spark PMC recently added Xinrong Meng as a committer on the project. > Xinrong is the major contributor of PySpark especially Pandas API on Spark. > She has guided a lot of new contributors

Question using multiple partition for Window cumulative functions when partition is not specified.

2021-08-29 Thread Haejoon Lee
Hi all, I noticed that Spark uses only one partition when performing Window cumulative functions without specifying the partition, so all the dataset is moved into a single partition which easily causes OOM or serious performance degradation. See the example below: >>> from pyspark.sql import