[ANNOUNCE] Apache Spark 3.0.3 released

2021-06-24 Thread Yi Wu
We are happy to announce the availability of Spark 3.0.3! Spark 3.0.3 is a maintenance release containing stability fixes. This release is based on the branch-3.0 maintenance branch of Spark. We strongly recommend all 3.0 users to upgrade to this stable release. To download Spark 3.0.3, head

Re: [DISCUSS] Rename hadoop-3.2/hadoop-2.7 profile to hadoop-3/hadoop-2?

2021-06-24 Thread Gengliang Wang
+1 for targeting the renaming for Apache Spark 3.3 at the current phase. On Fri, Jun 25, 2021 at 6:55 AM DB Tsai wrote: > +1 on renaming. > > DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > > On Jun 24, 2021, at 11:41 AM, Chao Sun wrote: > > Hi, > > As Spark master has upgraded

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread John Zhuge
Thanks Yikun! On Thu, Jun 24, 2021 at 8:54 PM Yikun Jiang wrote: > Hi, folks. > > As @Klaus mentioned, We have some work on Spark on k8s with volcano native > support. Also, there were also some production deployment validation from > our partners in China, like JingDong, XiaoHongShu, VIPshop.

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread Yikun Jiang
Hi, folks. As @Klaus mentioned, We have some work on Spark on k8s with volcano native support. Also, there were also some production deployment validation from our partners in China, like JingDong, XiaoHongShu, VIPshop. We will also prepare to propose an initial design and POC[3] on a shared

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

2021-06-24 Thread L . C . Hsieh
Thanks Anton. I'm voluntarily to be the shepherd of the SPIP. This is also my first time to shepherd a SPIP, so please let me know if anything I can improve. This looks great features and the rationale claimed by the proposal makes sense. These operations are getting more common and more

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

2021-06-24 Thread Jungtaek Lim
Meta question: this doesn't target Spark 3.2, right? Many folks have been working on branch cut for Spark 3.2, so might be less active to jump in new feature proposals right now. On Fri, Jun 25, 2021 at 9:00 AM Holden Karau wrote: > I took an initial look at the PRs this morning and I’ll go

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

2021-06-24 Thread Holden Karau
I took an initial look at the PRs this morning and I’ll go through the design doc in more detail but I think these features look great. It’s especially important with the CA regulation changes to make this easier for folks to implement. On Thu, Jun 24, 2021 at 4:54 PM Anton Okolnychyi wrote: >

[DISCUSS] SPIP: Row-level operations in Data Source V2

2021-06-24 Thread Anton Okolnychyi
Hey everyone, I'd like to start a discussion on adding support for executing row-level operations such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The execution should be the same across data sources and the best way to do that is to implement it in Spark. Right now, Spark can only

Re: [DISCUSS] Rename hadoop-3.2/hadoop-2.7 profile to hadoop-3/hadoop-2?

2021-06-24 Thread DB Tsai
+1 on renaming. DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Jun 24, 2021, at 11:41 AM, Chao Sun wrote: > > Hi, > > As Spark master has upgraded to Hadoop-3.3.1, the current Maven profile name > hadoop-3.2 is no longer accurate, and it may confuse Spark users when they

Re: [DISCUSS] Rename hadoop-3.2/hadoop-2.7 profile to hadoop-3/hadoop-2?

2021-06-24 Thread Dongjoon Hyun
For renaming, I'd target it for Apache Spark 3.3 instead of Apache Spark 3.2 because this is the first release of using Apache Hadoop 3.3.1 and we may need to revert Apache Hadoop 3.3.1 during RC period. Dongjoon. On Thu, Jun 24, 2021 at 12:24 PM Sean Owen wrote: > The downside here is that it

Re: [DISCUSS] Rename hadoop-3.2/hadoop-2.7 profile to hadoop-3/hadoop-2?

2021-06-24 Thread Sean Owen
The downside here is that it would break downstream builds that set hadoop-3.2 if it's now called hadoop-3. That's not a huge deal. We can retain dummy profiles under the old names that do nothing, but that would be a quieter 'break'. I suppose this naming is only of importance to developers, who

[DISCUSS] Rename hadoop-3.2/hadoop-2.7 profile to hadoop-3/hadoop-2?

2021-06-24 Thread Chao Sun
Hi, As Spark master has upgraded to Hadoop-3.3.1, the current Maven profile name hadoop-3.2 is no longer accurate, and it may confuse Spark users when they realize the actual version is not Hadoop 3.2.x. Therefore, I created https://issues.apache.org/jira/browse/SPARK-33880 to change the profile

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread Mich Talebzadeh
Hi Holden, Thank you for your points. I guess coming from a corporate world I had an oversight on how an open source project like Spark does leverage resources and interest :). As @KlausMa kindly volunteered it would be good to hear scheduling ideas on Spark on Kubernetes and of course as I am

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread Holden Karau
Hi Mich, I certainly think making Spark on Kubernetes run well is going to be a challenge. However I think, and I could be wrong about this as well, that in terms of cluster managers Kubernetes is likely to be our future. Talking with people I don't hear about new standalone, YARN or mesos

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread Holden Karau
That's awesome, I'm just starting to get context around Volcano but maybe we can schedule an initial meeting for all of us interested in pursuing this to get on the same page. On Wed, Jun 23, 2021 at 6:54 PM Klaus Ma wrote: > Hi team, > > I'm kube-batch/Volcano founder, and I'm excited to hear

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread John Zhuge
Thanks Klaus! I am interested in more details. On Wed, Jun 23, 2021 at 6:54 PM Klaus Ma wrote: > Hi team, > > I'm kube-batch/Volcano founder, and I'm excited to hear that the spark > community also has such requirements :) > > Volcano provides several features for batch workload, e.g.

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread Mich Talebzadeh
Thanks Klaus. That will be great. It will also be intuitive if you elaborate the need for this feature in line with the limitation of the current batch workload. Regards, Mich view my Linkedin profile *Disclaimer:* Use it at