Re: `Target Version` management on correctness/data-loss Issues

2020-01-27 Thread Dongjoon Hyun
Hi, All. Currently, there is only one correctness issue which is targeting at 2.4.5. SPARK-28344 Fail the query if detect ambiguous self join -> Duplicated by SPARK-10892 Join with Data Frame returns wrong results SPARK-27547 fix DataFrame self-join problems

Re: Closing stale PRs with a GitHub Action

2020-01-27 Thread Hyukjin Kwon
Thanks for doing this Nicholas. 2020년 1월 28일 (화) 오전 8:15, Nicholas Chammas 님이 작성: > A brief update here: At the start of December when I started this thread > we had almost 500 open PRs. Now that the Stale workflow has had time to > catch up, we're down to ~280 open PRs. > > More impressive than

Re: More publicly documenting the options under spark.sql.*

2020-01-27 Thread Nicholas Chammas
I am! Thanks for the reference. On Thu, Jan 16, 2020 at 9:53 PM Hyukjin Kwon wrote: > Nicholas, are you interested in taking a stab at this? You could refer > https://github.com/apache/spark/commit/60472dbfd97acfd6c4420a13f9b32bc9d84219f3 > > 2020년 1월 17일 (금) 오전 8:48, Takeshi Yamamuro 님이 작성: >

Re: Closing stale PRs with a GitHub Action

2020-01-27 Thread Nicholas Chammas
A brief update here: At the start of December when I started this thread we had almost 500 open PRs. Now that the Stale workflow has had time to catch up, we're down to ~280 open PRs. More impressive than the number of stale PRs that got closed

Re: Enabling push-based shuffle in Spark

2020-01-27 Thread Long, Andrew
The easiest would be to create a fork of the code in github. I can also accept diffs. Cheers Andrew From: Min Shen Date: Monday, January 27, 2020 at 12:48 PM To: "Long, Andrew" , "dev@spark.apache.org" Subject: Re: Enabling push-based shuffle in Spark Hi Andrew, We are leveraging

Re: Enabling push-based shuffle in Spark

2020-01-27 Thread Min Shen
Hi Andrew, We are leveraging SPARK-6237 to control the off-heap memory consumption due to Netty. With that change, the data is processed in a streaming fashion so Netty does not buffer an entire RPC in memory before handing it over to RPCHandler. We tested with our internal stress testing

Re: `Target Version` management on correctness/data-loss Issues

2020-01-27 Thread Dongjoon Hyun
Yes. That is what I pointed in `Unfortunately, we didn't build a consensus on what is really blocked by that.` If you are suggesting a vote, do you mean a majority-win vote or an unanimous decision? Will it be a permanent decision? > I think the other interesting thing here is how exactly to come

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

2020-01-27 Thread Jim Kleckner
Sure, it seems like an optional thing to me. Spark has a Jenkins setup for building and testing. This would only affect someone that pushes the code to gitlab. I'm happy to keep the commit in a small private branch of my own that I apply when I need to build an out of cycle build. I just

Re: `Target Version` management on correctness/data-loss Issues

2020-01-27 Thread Tom Graves
thanks for bringing this up. A) I'm not clear on this one as to why affected and target would be different initially, other then the reasons target versions != fixed versions.  Is the intention here just to say, if its already been discussed and came to consensus not needed in certain release?