Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-08 Thread Takuya UESHIN
+1 On Mon, Jul 8, 2024 at 6:05 PM Yuanjian Li wrote: > +1 > > Hyukjin Kwon 于2024年7月4日周四 16:54写道: > >> (I will leave this vote open till 10th July, considering that its holiday >> season in US) >> >> On Fri, 5 Jul 2024 at 06:12, Martin Grund wrote: >> >>> +1 (non-binding) >>> >>> On Thu, Jul

Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-08 Thread Yuanjian Li
+1 Hyukjin Kwon 于2024年7月4日周四 16:54写道: > (I will leave this vote open till 10th July, considering that its holiday > season in US) > > On Fri, 5 Jul 2024 at 06:12, Martin Grund wrote: > >> +1 (non-binding) >> >> On Thu, Jul 4, 2024 at 7:15 PM Holden Karau >> wrote: >> >>> +1 >>> >>> Although

Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-08 Thread Ruifeng Zheng
+1 On Sat, Jul 6, 2024 at 4:45 AM bo yang wrote: > +1 This is a great suggestion, thanks Hyukjin! > > > On Thu, Jul 4, 2024 at 4:11 AM Hyukjin Kwon wrote: > >> Alright! let me start the vote! >> >> On Thu, 4 Jul 2024 at 16:31, Mich Talebzadeh >> wrote: >> >>> A good point agreed. >>> >>> Mich

Re: [DISCUSS] Auto scaling support for structured streaming

2024-07-08 Thread Pavan Kotikalapudi
Definitely!. We internally use it extensively in all our apps and would love to get community feedback. I think we have enough work done to move this feature forward. We had discussion and vote threads already published in the past, but we need enough backing/votes of the PMC members to take it

Re: [DISCUSS] Auto scaling support for structured streaming

2024-07-08 Thread Nimrod Ofek
Hi, Thanks Pavan. I think that the change is very important due to the amount of Spark structured streaming apps running today out there... IMHO this should be introduced in the upcoming Spark 4.0.0 version as an experimental feature for evaluation by the community... What should be the next

Re: [DISCUSS] Auto scaling support for structured streaming

2024-07-08 Thread Pavan Kotikalapudi
Hi, I have taken up the responsibility for the development of that feature right now. Here is the current work https://github.com/apache/spark/pull/42352 last active email thread (maybe you want to reply to this): Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-05 Thread bo yang
+1 This is a great suggestion, thanks Hyukjin! On Thu, Jul 4, 2024 at 4:11 AM Hyukjin Kwon wrote: > Alright! let me start the vote! > > On Thu, 4 Jul 2024 at 16:31, Mich Talebzadeh > wrote: > >> A good point agreed. >> >> Mich Talebzadeh, >> Technologist | Architect | Data Engineer |

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-05 Thread huaxin gao
+1 On Fri, Jul 5, 2024 at 12:46 AM Herman van Hovell wrote: > +1 > > On Fri, Jul 5, 2024 at 1:52 AM Hyukjin Kwon wrote: > >> (I will leave this vote open till 10th July, considering that its holiday >> season in US) >> >> On Thu, 4 Jul 2024 at 23:39, Peter Toth wrote: >> >>> +1 >>> >>> John

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-05 Thread Herman van Hovell
+1 On Fri, Jul 5, 2024 at 1:52 AM Hyukjin Kwon wrote: > (I will leave this vote open till 10th July, considering that its holiday > season in US) > > On Thu, 4 Jul 2024 at 23:39, Peter Toth wrote: > >> +1 >> >> John Zhuge ezt írta (időpont: 2024. júl. 4., Cs, >> 5:38): >> >>> +1 >>> >>> >>>

Re: Spark decommission

2024-07-04 Thread Arun Ravi
Hi Rajesh, We use it production at scale. We run spark on kubernetes on aws cloud and here are the key things that we do 1) we run driver on on-demand node 2) we have configured decommission along with fallback option on to S3, try the latest single zone S3 for this. 3) We use pvc aware

Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Hyukjin Kwon
(I will leave this vote open till 10th July, considering that its holiday season in US) On Fri, 5 Jul 2024 at 06:12, Martin Grund wrote: > +1 (non-binding) > > On Thu, Jul 4, 2024 at 7:15 PM Holden Karau > wrote: > >> +1 >> >> Although given its a US holiday maybe keep the vote open for an

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-04 Thread Hyukjin Kwon
(I will leave this vote open till 10th July, considering that its holiday season in US) On Thu, 4 Jul 2024 at 23:39, Peter Toth wrote: > +1 > > John Zhuge ezt írta (időpont: 2024. júl. 4., Cs, > 5:38): > >> +1 >> >> >> John Zhuge >> >> >> On Wed, Jul 3, 2024 at 7:41 PM Gengliang Wang wrote:

Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Martin Grund
+1 (non-binding) On Thu, Jul 4, 2024 at 7:15 PM Holden Karau wrote: > +1 > > Although given its a US holiday maybe keep the vote open for an extra day? > > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9

Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Holden Karau
+1 Although given its a US holiday maybe keep the vote open for an extra day? Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Thu,

[DISCUSS] Auto scaling support for structured streaming

2024-07-04 Thread Nimrod Ofek
Hi, I remember there was a discussion about better supporting auto scaling for structured streaming. Is there anything happening with that for the upcoming Spark 4.0 release? Will there be support for auto scaling (at least on K8s) spark structured streaming apps? Thanks, Nimrod

Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Mich Talebzadeh
+1 non-binding Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime PhD Imperial College London London, United Kingdom view my Linkedin profile

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-04 Thread Peter Toth
+1 John Zhuge ezt írta (időpont: 2024. júl. 4., Cs, 5:38): > +1 > > > John Zhuge > > > On Wed, Jul 3, 2024 at 7:41 PM Gengliang Wang wrote: > >> +1 >> >> On Wed, Jul 3, 2024 at 4:48 PM Reynold Xin >> wrote: >> >>> +1 >>> >>> On Wed, Jul 3, 2024 at 4:45 PM L. C. Hsieh wrote: >>> +1

Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Denny Lee
+1 (non-binding) On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon wrote: > Hi all, > > I’d like to start a vote for allowing GitHub Actions runs for > contributors' PRs without approvals in apache/spark-connect-go. > > Please also refer to: > >- Discussion thread: >

[VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Hyukjin Kwon
Hi all, I’d like to start a vote for allowing GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go. Please also refer to: - Discussion thread: https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420 - JIRA ticket:

Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Hyukjin Kwon
Alright! let me start the vote! On Thu, 4 Jul 2024 at 16:31, Mich Talebzadeh wrote: > A good point agreed. > > Mich Talebzadeh, > Technologist | Architect | Data Engineer | Generative AI | FinCrime > PhD Imperial College > London

Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Mich Talebzadeh
A good point agreed. Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime PhD Imperial College London London, United Kingdom view my Linkedin profile

Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-03 Thread Martin Grund
Absolutely we should do that. I thought that the default rule was inclusive already so that once folks have their first contribution it would automatically allow kicking of the workflows. On Thu, Jul 4, 2024 at 04:20 Matthew Powers wrote: > Yea, this would be great. > > spark-connect-go is

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread John Zhuge
+1 John Zhuge On Wed, Jul 3, 2024 at 7:41 PM Gengliang Wang wrote: > +1 > > On Wed, Jul 3, 2024 at 4:48 PM Reynold Xin > wrote: > >> +1 >> >> On Wed, Jul 3, 2024 at 4:45 PM L. C. Hsieh wrote: >> >>> +1 >>> >>> On Wed, Jul 3, 2024 at 3:54 PM Dongjoon Hyun >>> wrote: >>> > >>> > +1 >>> >

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Chao Sun
+1 On Wed, Jul 3, 2024 at 6:24 PM Jungtaek Lim wrote: > +1 (non-binding) > > Thanks! > > On Thu, Jul 4, 2024 at 8:48 AM Reynold Xin > wrote: > >> +1 >> >> On Wed, Jul 3, 2024 at 4:45 PM L. C. Hsieh wrote: >> >>> +1 >>> >>> On Wed, Jul 3, 2024 at 3:54 PM Dongjoon Hyun >>> wrote: >>> > >>> >

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Wenchen Fan
+1 On Thu, Jul 4, 2024 at 10:41 AM Gengliang Wang wrote: > +1 > > On Wed, Jul 3, 2024 at 4:48 PM Reynold Xin > wrote: > >> +1 >> >> On Wed, Jul 3, 2024 at 4:45 PM L. C. Hsieh wrote: >> >>> +1 >>> >>> On Wed, Jul 3, 2024 at 3:54 PM Dongjoon Hyun >>> wrote: >>> > >>> > +1 >>> > >>> > Dongjoon

Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-03 Thread Matthew Powers
Yea, this would be great. spark-connect-go is still experimental and anything we can do to get it production grade would be a great step IMO. The Go community is excited to write Spark... with Go! On Wed, Jul 3, 2024 at 8:49 PM Hyukjin Kwon wrote: > Hi all, > > The Spark Connect Go client

[DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-03 Thread Hyukjin Kwon
Hi all, The Spark Connect Go client repository ( https://github.com/apache/spark-connect-go) requires GitHub Actions runs for individual commits within contributors' PRs. This policy was intentionally applied ( https://issues.apache.org/jira/browse/INFRA-24387), but we can change this default

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Jungtaek Lim
+1 (non-binding) Thanks! On Thu, Jul 4, 2024 at 8:48 AM Reynold Xin wrote: > +1 > > On Wed, Jul 3, 2024 at 4:45 PM L. C. Hsieh wrote: > >> +1 >> >> On Wed, Jul 3, 2024 at 3:54 PM Dongjoon Hyun >> wrote: >> > >> > +1 >> > >> > Dongjoon >> > >> > On Wed, Jul 3, 2024 at 10:58 Xinrong Meng

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Gengliang Wang
+1 On Wed, Jul 3, 2024 at 4:48 PM Reynold Xin wrote: > +1 > > On Wed, Jul 3, 2024 at 4:45 PM L. C. Hsieh wrote: > >> +1 >> >> On Wed, Jul 3, 2024 at 3:54 PM Dongjoon Hyun >> wrote: >> > >> > +1 >> > >> > Dongjoon >> > >> > On Wed, Jul 3, 2024 at 10:58 Xinrong Meng wrote: >> >> >> >> +1 >> >>

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Reynold Xin
+1 On Wed, Jul 3, 2024 at 4:45 PM L. C. Hsieh wrote: > +1 > > On Wed, Jul 3, 2024 at 3:54 PM Dongjoon Hyun > wrote: > > > > +1 > > > > Dongjoon > > > > On Wed, Jul 3, 2024 at 10:58 Xinrong Meng wrote: > >> > >> +1 > >> > >> Thank you @Hyukjin Kwon ! > >> > >> On Wed, Jul 3, 2024 at 8:55 AM bo

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread L. C. Hsieh
+1 On Wed, Jul 3, 2024 at 3:54 PM Dongjoon Hyun wrote: > > +1 > > Dongjoon > > On Wed, Jul 3, 2024 at 10:58 Xinrong Meng wrote: >> >> +1 >> >> Thank you @Hyukjin Kwon ! >> >> On Wed, Jul 3, 2024 at 8:55 AM bo yang wrote: >>> >>> +1 (non-binding) >>> >>> >>> On Tue, Jul 2, 2024 at 11:22 PM

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Dongjoon Hyun
+1 Dongjoon On Wed, Jul 3, 2024 at 10:58 Xinrong Meng wrote: > +1 > > Thank you @Hyukjin Kwon ! > > On Wed, Jul 3, 2024 at 8:55 AM bo yang wrote: > >> +1 (non-binding) >> > >> On Tue, Jul 2, 2024 at 11:22 PM Cheng Pan wrote: >> >>> +1 (non-binding) >>> >>> Thanks, >>> Cheng Pan >>> >>> >>>

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Matthew Powers
+1 (non-binding) Thanks! On Wed, Jul 3, 2024 at 1:58 PM Xinrong Meng wrote: > +1 > > Thank you @Hyukjin Kwon ! > > On Wed, Jul 3, 2024 at 8:55 AM bo yang wrote: > >> +1 (non-binding) >> >> On Tue, Jul 2, 2024 at 11:22 PM Cheng Pan wrote: >> >>> +1 (non-binding) >>> >>> Thanks, >>> Cheng Pan

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Takuya UESHIN
+1 On Wed, Jul 3, 2024 at 10:58 AM Xinrong Meng wrote: > +1 > > Thank you @Hyukjin Kwon ! > > On Wed, Jul 3, 2024 at 8:55 AM bo yang wrote: > >> +1 (non-binding) >> >> On Tue, Jul 2, 2024 at 11:22 PM Cheng Pan wrote: >> >>> +1 (non-binding) >>> >>> Thanks, >>> Cheng Pan >>> >>> >>> On Jul 3,

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Xinrong Meng
+1 Thank you @Hyukjin Kwon ! On Wed, Jul 3, 2024 at 8:55 AM bo yang wrote: > +1 (non-binding) > > On Tue, Jul 2, 2024 at 11:22 PM Cheng Pan wrote: > >> +1 (non-binding) >> >> Thanks, >> Cheng Pan >> >> >> On Jul 3, 2024, at 08:59, Hyukjin Kwon wrote: >> >> Hi all, >> >> I’d like to start a

Re: Deploying Spark on Kubernetes Operator

2024-07-03 Thread L. C. Hsieh
Thanks for being interested in the Spark Kubernetes Operator. Because the initial PR is large so it is split into several PRs which are good to review and merge. And seems the initial series of PRs to merge the codes into the repo is not done yet. For example, you can see there is PR to add the

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread bo yang
+1 (non-binding) On Tue, Jul 2, 2024 at 11:22 PM Cheng Pan wrote: > +1 (non-binding) > > Thanks, > Cheng Pan > > > On Jul 3, 2024, at 08:59, Hyukjin Kwon wrote: > > Hi all, > > I’d like to start a vote for moving Spark Connect server to builtin > package (Client API layer stays external). > >

Re: [外部邮件] Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Kent Yao
+1 (non-binding), Kent Martin Grund 于2024年7月3日周三 14:11写道: > > +1 (non-binding) > > On Wed, Jul 3, 2024 at 07:25 Holden Karau wrote: >> >> +1 >> >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 >> YouTube Live Streams:

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Cheng Pan
+1 (non-binding) Thanks, Cheng Pan > On Jul 3, 2024, at 08:59, Hyukjin Kwon wrote: > > Hi all, > > I’d like to start a vote for moving Spark Connect server to builtin package > (Client API layer stays external). > > Please also refer to: > >- Discussion thread: >

Re: [外部邮件] Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Martin Grund
+1 (non-binding) On Wed, Jul 3, 2024 at 07:25 Holden Karau wrote: > +1 > > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > > >

Re: [外部邮件] Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Tue, Jul 2, 2024 at 10:18 PM yangjie01 wrote: > +1 (non-binding) > > > >

Re: [外部邮件] Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread yangjie01
+1 (non-binding) 发件人: Denny Lee 日期: 2024年7月3日 星期三 09:12 收件人: Hyukjin Kwon 抄送: dev 主题: [外部邮件] Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external) +1 (non-binding) On Wed, Jul 3, 2024 at 9:11 AM Hyukjin Kwon mailto:gurwls...@apache.org>> wrote: Starting

Deploying Spark on Kubernetes Operator

2024-07-02 Thread Liu Fangxu
Hello Spark team, I hope this email finds you well. I am writing on behalf of my team, which is looking to deploy a Spark Kubernetes Operator in our cluster. We have come across this latest initiative on GitHub to develop a Java based

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Denny Lee
+1 (non-binding) On Wed, Jul 3, 2024 at 9:11 AM Hyukjin Kwon wrote: > Starting with my own +1. > > On Wed, 3 Jul 2024 at 09:59, Hyukjin Kwon wrote: > >> Hi all, >> >> I’d like to start a vote for moving Spark Connect server to builtin >> package (Client API layer stays external). >> >> Please

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Hyukjin Kwon
Starting with my own +1. On Wed, 3 Jul 2024 at 09:59, Hyukjin Kwon wrote: > Hi all, > > I’d like to start a vote for moving Spark Connect server to builtin > package (Client API layer stays external). > > Please also refer to: > >- Discussion thread: >

[VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Hyukjin Kwon
Hi all, I’d like to start a vote for moving Spark Connect server to builtin package (Client API layer stays external). Please also refer to: - Discussion thread: https://lists.apache.org/thread/odlx9b552dp8yllhrdlp24pf9m9s4tmx - JIRA ticket:

Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Hyukjin Kwon
Alrighty, let me start the vote to make sure everybody is happy :-). On Wed, 3 Jul 2024 at 09:55, Hyukjin Kwon wrote: > It will be fine for non-connect users. When we are actually moving client > one, I think we should go with an SPIP cuz that might affect end users > > On Tue, 2 Jul 2024

Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Hyukjin Kwon
It will be fine for non-connect users. When we are actually moving client one, I think we should go with an SPIP cuz that might affect end users On Tue, 2 Jul 2024 at 23:05, Holden Karau wrote: > I guess my one concern here would be are we going to expand the > dependencies that are visible

Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Holden Karau
I guess my one concern here would be are we going to expand the dependencies that are visible on the class path for non-connect users? One of the pain points that folks experienced with upgrading can be from those changing. Otherwise this seems pretty reasonable. Twitter:

Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Matthew Powers
This is a great idea and would be a great quality of life improvement. +1 (non-binding) On Tue, Jul 2, 2024 at 4:56 AM Hyukjin Kwon wrote: > > while leaving the connect jvm client in a separate folder looks weird > > I plan to actually put it at the top level together but I feel like this >

Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Hyukjin Kwon
> while leaving the connect jvm client in a separate folder looks weird I plan to actually put it at the top level together but I feel like this has to be done with SPIP so I am moving internal server side first orthogonally On Tue, 2 Jul 2024 at 17:54, Cheng Pan wrote: > Thanks for raising

Re: [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Cheng Pan
Thanks for raising this discussion, I think putting the connect folder on the top level is a good idea to promote Spark Connect, while leaving the connect jvm client in a separate folder looks weird. I suppose there is no contract to leave all optional modules under `connector`? e.g.

Re: [外部邮件] [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-02 Thread Martin Grund
+1 On Tue, Jul 2, 2024 at 7:19 AM yangjie01 wrote: > I have manually attempted to only modify the `assembly/pom.xml` and > examined the results of executing `dev/make-distribution.sh --tgz`. The > `spark-connect_2.13-4.0.0-SNAPSHOT.jar` is indeed included in the jars > directory. However, if

Re: [外部邮件] [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-01 Thread yangjie01
I have manually attempted to only modify the `assembly/pom.xml` and examined the results of executing `dev/make-distribution.sh --tgz`. The `spark-connect_2.13-4.0.0-SNAPSHOT.jar` is indeed included in the jars directory. However, if rearranging the directories would result in a clearer

Re: [外部邮件] [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-01 Thread Hyukjin Kwon
My concern is that the `connector` directory is really for external/optional packages (and they aren't included in assembly IIRC).. so I am hesitant to just change the assembly. The actual changes are not quite large but it moves the files around. On Tue, 2 Jul 2024 at 12:23, yangjie01 wrote:

Re: [外部邮件] [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-01 Thread yangjie01
I'm supportive of this initiative. However, if the purpose is just to avoid the additional `--packages` option, it seems that making some adjustments to the `assembly/pom.xml` could potentially meet our goal. Is it really necessary to restructure the code directory? Jie Yang 发件人: Hyukjin Kwon

[DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-01 Thread Hyukjin Kwon
Hi all, I would like to discuss moving Spark Connect server to builtin package. Right now, users have to specify —packages when they run Spark Connect server script, for example: ./sbin/start-connect-server.sh --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` or

Spark decommission

2024-06-26 Thread Rajesh Mahindra
Hi folks, I am planning to leverage the "Spark Decommission" feature in production since our company uses SPOT instances on Kubernetes. I wanted to get a sense of how stable the feature is for production usage and if any one has thoughts around trying it out in production, especially in

Re: 4.0.0-preview1 test report: running on Yarn

2024-06-18 Thread George Magiros
Thank you all so much for the kind words of encouragement on my first test report. As a follow up, I ran all my HDFS and Yarn nodes on Java 8 - including my Nodemanagers. I then modified Spark's conf/spark-defaults.conf according to Mr. Pan's prior post, and it worked: I was able to submit

Re: 4.0.0-preview1 test report: running on Yarn

2024-06-18 Thread Cheng Pan
FYI, I have submitted SPARK-48651(https://github.com/apache/spark/pull/47010) to update the Spark on YARN docs for JDK configuration, looking forward to your feedback. Thanks, Cheng Pan > On Jun 18, 2024, at 02:00, George Magiros wrote: > > I successfully submitted and ran

unsubscribe

2024-06-18 Thread Cenk Ariöz
unsubscribe

Re: 4.0.0-preview1 test report: running on Yarn

2024-06-17 Thread Cheng Pan
You don’t need to upgrade Java for HDFS and YARN. Just keep using Java 8 for Hadoop and set JAVA_HOME to Java 17 for Spark applications[1]. 0. Install Java 17 on all nodes, for example, under /opt/openjdk-17 1. Modify $SPARK_CONF_DIR/spark-env.sh export JAVA_HOME=/opt/openjdk-17 2. Modify

Re: 4.0.0-preview1 test report: running on Yarn

2024-06-17 Thread Wenchen Fan
Thanks for sharing! Yea Spark 4.0 is built using Java 17. On Tue, Jun 18, 2024 at 5:07 AM George Magiros wrote: > I successfully submitted and ran org.apache.spark.examples.SparkPi on Yarn > using 4.0.0-preview1. However I got it to work only after fixing an issue > with the Yarn nodemanagers

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-17 Thread Allison Wang
I'm a big +1 on this proposal. We should be able to continue improving the programming guides to enhance their quality and make this process easier. > Move the programming guide to the spark-website repo, to allow faster iterations and releases This is a great idea. It should work for structured

4.0.0-preview1 test report: running on Yarn

2024-06-17 Thread George Magiros
I successfully submitted and ran org.apache.spark.examples.SparkPi on Yarn using 4.0.0-preview1. However I got it to work only after fixing an issue with the Yarn nodemanagers (Hadoop v3.3.6 and v3.4.0). Namely the issue was: 1. If the nodemanagers used java 11, Yarn threw an error about not

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-11 Thread serge rielau . com
I think some of the issues raised here are not really common. Examples should follow best practice. It would be odd to have an example that exploits ansi.enabled=false to e.g. overflow an integer. Instead an example that works with ansi mode will typically work perfectly fine in an older

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-11 Thread Wenchen Fan
Shall we decouple these two decisions? - Move the programming guide to the spark-website repo, to allow faster iterations and releases - Make programming guide version-less I think the downside of moving the programming guide to the spark-website repo is almost negligible: you may need

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-11 Thread Neil Ramaswamy
There are two issues and one main benefit that I see with versioned programming guides: - *Issue 1*: We often retroactively realize that code snippets have bugs and explanations are confusing (see examples: dropDuplicates ,

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-11 Thread Wenchen Fan
Just FYI, the Hive languages manual is also version-less: https://cwiki.apache.org/confluence/display/Hive/LanguageManual It's not a strong data point as this doc is not actively updated, but my personal feeling is that it's nice to see the history of a feature: when it was introduced, when it

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-10 Thread Nimrod Ofek
My personal opinion is that having the documents per version (current and previous), without fixing previous versions - just keeping them as a snapshot in time of the current documentation once the new version was released, should be good enough. Because now Neil would like to change the

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-10 Thread Nicholas Chammas
I will let Neil and Matt clarify the details because I believe they understand the overall picture better. However, I would like to emphasize something that motivated this effort and which may be getting lost in the concerns about versioned vs. versionless docs. The main problem is that some

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-10 Thread Mridul Muralidharan
Hi, Versioned documentation has the benefit that users can have reasonable confidence that features, functionality and examples mentioned will work with that released Spark version. A versionless guide runs into potential issues with deprecation, behavioral changes and new features. My concern

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-10 Thread Hyukjin Kwon
I am +1 on this but as you guys mentioned, we should really be clear on how to address different versions. On Wed, 5 Jun 2024 at 18:27, Matthew Powers wrote: > I am a huge fan of the Apache Spark docs and I regularly look at the > analytics on this page >

Re: [External] Re: push-based external shuffle service on K8S - Spark 4.0? Earlier versions?

2024-06-07 Thread Ofir Manor
Hi Ye - I am running Spark on K8S, looking to see if someone made external shuffle service on K8S work in their environment (ex: with some out-of-tree patches or hacks), as the push-based variant seems like it would be a great fit for me. Ofir From: Ye Zhou

Re: push-based external shuffle service on K8S - Spark 4.0? Earlier versions?

2024-06-06 Thread Ye Zhou
Hi Ofir. Right now, the push based shuffle within Spark is only supported for Spark on YARN, with external shuffle service running as auxiliary service in NodeManager, but not natively on K8s. As far as I know, there are no recent plans to add the support for Spark on K8s natively. For question

Re: push-based external shuffle service on K8S - Spark 4.0? Earlier versions?

2024-06-06 Thread Keyong Zhou
Hi Ofir, I can provide some information about use cases for Apache Celeborn. Apache Celeborn can be deployed on K8s and standalone, both are widely used in production environment by users. The largest cluster I know contains more than 1,000 Celeborn workers. Celeborn is specially beneficial for

push-based external shuffle service on K8S - Spark 4.0? Earlier versions?

2024-06-06 Thread Ofir Manor
Hi, Regarding the external shuffle service on K8S and especially the push-based variant that was merged in 3.2: 1. Are there plans to make it supported and work out-of-the-box in 4.0? 2. Did anyone make it work for themselves in 3.5 or earlier? If so, can you share your experience and what

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-05 Thread Matthew Powers
I am a huge fan of the Apache Spark docs and I regularly look at the analytics on this page to see how well they are doing. Great work to everyone that's contributed to the

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-05 Thread Neil Ramaswamy
Thanks all for the responses. Let me try to address everything. > the programming guides are also different between versions since features are being added, configs are being added/ removed/ changed, defaults are being changed etc. I agree that this is the case. But I think it's fine to mention

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-05 Thread Wenchen Fan
I agree with the idea of a versionless programming guide. But one thing we need to make sure of is we give clear messages for things that are only available in a new version. My proposal is: 1. keep the old versions' programming guide unchanged. For example, people can still access

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-05 Thread Martin Andersson
While I have no practical knowledge of how documentation is maintained in the spark project, I must agree with Nimrod. For users on older versions, having a programming guide that refers to features or API methods that does not exist in that version is confusing and detrimental. Surely there

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-05 Thread Nimrod Ofek
Hi Neil, While you wrote you don't mean the api docs (of course), the programming guides are also different between versions since features are being added, configs are being added/ removed/ changed, defaults are being changed etc. I know of "backport hell" - which is why I wrote that once a

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-04 Thread Neil Ramaswamy
Hi Nimrod, Quick clarification—my proposal will not touch API-specific documentation for the specific reasons you mentioned (signatures, behavior, etc.). It just aims to make the *programming guides *versionless. Programming guides should teach fundamentals of Spark, and the fundamentals of Spark

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-04 Thread Nimrod Ofek
Hi, While I think that the documentation needs a lot of improvement and important details are missing - and detaching the documentation from the main project can help iterating faster on documentation specific tasks, I don't think we can nor should move to versionless documentation.

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-04 Thread Praveen Gattu
+1. This helps for greater velocity in improving docs. However, we might still need a way to provide version specific information isn't it, i.e. what features are available in which version etc. On Mon, Jun 3, 2024 at 3:08 PM Neil Ramaswamy wrote: > Hi all, > > I've written up a proposal to

[ANNOUNCE] Announcing Apache Spark 4.0.0-preview1

2024-06-03 Thread Wenchen Fan
Hi all, To enable wide-scale community testing of the upcoming Spark 4.0 release, the Apache Spark community has posted a preview release of Spark 4.0. This preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code

[DISCUSS] Variant shredding specification

2024-06-03 Thread Gene Pang
Hi all, We have been working on the Variant data type, which is designed to store and process semi-structured data efficiently, even with heterogeneous values. Users can store and process semi-structured data in a flexible way, without having to specify or know any fixed schema on write. Variant

Re: [VOTE] SPARK 4.0.0-preview1 (RC3)

2024-06-02 Thread Wenchen Fan
The vote passes with 6+1s (4 binding +1s). (* = binding) +1: Wenchen Fan (*) Kent Yao Cheng Pan Xiao Li (*) Gengliang Wang (*) Tathagata Das (*) Thanks all! On Fri, May 31, 2024 at 6:07 PM Tathagata Das wrote: > +1 > - Tested RC3 with Delta Lake. All our Scala and Python tests pass. > > On

Unsubscribe

2024-05-31 Thread Ashish Singh

Re: [VOTE] SPARK 4.0.0-preview1 (RC3)

2024-05-31 Thread Tathagata Das
+1 - Tested RC3 with Delta Lake. All our Scala and Python tests pass. On Fri, May 31, 2024 at 3:24 PM Xiao Li wrote: > +1 > > Cheng Pan 于2024年5月30日周四 09:48写道: > >> +1 (non-binding) >> >> - All links are valid >> - Run some basic quires using YARN client mode with Apache Hadoop v3.3.6, >> HMS

Unsubscribe

2024-05-31 Thread Ashish
Sent from my iPhone - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] SPARK 4.0.0-preview1 (RC3)

2024-05-31 Thread Gengliang Wang
+1 On Fri, May 31, 2024 at 11:06 AM Xiao Li wrote: > +1 > > Cheng Pan 于2024年5月30日周四 09:48写道: > >> +1 (non-binding) >> >> - All links are valid >> - Run some basic quires using YARN client mode with Apache Hadoop v3.3.6, >> HMS 2.3.9 >> - Pass integration tests with Apache Kyuubi v1.9.1 RC0 >>

Re: [VOTE] SPARK 4.0.0-preview1 (RC3)

2024-05-31 Thread Xiao Li
+1 Cheng Pan 于2024年5月30日周四 09:48写道: > +1 (non-binding) > > - All links are valid > - Run some basic quires using YARN client mode with Apache Hadoop v3.3.6, > HMS 2.3.9 > - Pass integration tests with Apache Kyuubi v1.9.1 RC0 > > Thanks, > Cheng Pan > > > On May 29, 2024, at 02:48, Wenchen Fan

Re: [VOTE] SPARK 4.0.0-preview1 (RC3)

2024-05-30 Thread Cheng Pan
+1 (non-binding) - All links are valid - Run some basic quires using YARN client mode with Apache Hadoop v3.3.6, HMS 2.3.9 - Pass integration tests with Apache Kyuubi v1.9.1 RC0 Thanks, Cheng Pan > On May 29, 2024, at 02:48, Wenchen Fan wrote: > > Please vote on releasing the following

Re: [VOTE] SPARK 4.0.0-preview1 (RC3)

2024-05-30 Thread Kent Yao
+1 (non-binding), I have checked: - Download links are fine - Signatures and integrities are fine - Build from source - run-example successfully with some example codes - No block issues from my side - Duplicated jars[1][2] found in both hive-jackson and examples/jars, the latter seems not

Unsubscribe

2024-05-29 Thread Jang tao
Unsubscribe

Re: [DISCUSS] clarify the definition of behavior changes

2024-05-28 Thread Wenchen Fan
Hi all, I've created a PR to put the behavior change guideline on the Spark website: https://github.com/apache/spark-website/pull/518 . Please leave comments if you have any, thanks! On Wed, May 15, 2024 at 1:41 AM Wenchen Fan wrote: > Thanks all for the feedback here! Let me put up a new

unsubscribe

2024-05-28 Thread Lucas De Jaeger
unsubscribe

Re: [VOTE] SPARK 4.0.0-preview1 (RC3)

2024-05-28 Thread Wenchen Fan
one correction: "The tag to be voted on is v4.0.0-preview1-rc2 (commit 7cfe5a6e44e8d7079ae29ad3e2cee7231cd3dc66)" should be "The tag to be voted on is v4.0.0-preview1-rc3 (commit 7a7a8bc4bab591ac8b98b2630b38c57adf619b82):" On Tue, May 28, 2024 at 11:48 AM Wenchen Fan wrote: > Please vote on

[VOTE] SPARK 4.0.0-preview1 (RC3)

2024-05-28 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0-preview1. The vote is open until May 31 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0-preview1 [ ] -1 Do not release this package

<    1   2   3   4   5   6   7   8   9   10   >