Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊
And it should be generic for HashJoin not only broadcast join, right? Chrysan Wu 吴晓菊 Phone:+86 17717640807 2018-06-29 10:42 GMT+08:00 吴晓菊 : > Sorry for the mistake. You are right output ordering of broadcast join can > be the order of big table in some types of join. I will prepare a PR and >

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊
Sorry for the mistake. You are right output ordering of broadcast join can be the order of big table in some types of join. I will prepare a PR and let you review later. Thanks a lot! Chrysan Wu 吴晓菊 Phone:+86 17717640807 2018-06-29 0:00 GMT+08:00 Wenchen Fan : > SortMergeJoin sorts its

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
Yep, that's right. There were a bunch of things that were removed from those scripts that made it tricky to build 2.1 (like Scala 2.10 support). I think it's good to keep the scripts working for older releases since that allows is to fix things / add features to them without having to backport to

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Felix Cheung
If I recall we stop releasing Hadoop 2.3 or 2.4 in newer releases (2.2+?) - that might be why they are not the release script. From: Marcelo Vanzin Sent: Thursday, June 28, 2018 11:12:45 AM To: Sean Owen Cc: Marcelo Vanzin; dev Subject: Re: [VOTE] Spark 2.1.3

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
Alright, uploaded the missing packages. I'll send a PR to update the release scripts just in case... On Thu, Jun 28, 2018 at 10:08 AM, Sean Owen wrote: > If it's easy enough to produce them, I agree you can just add them to the RC > dir. > > On Thu, Jun 28, 2018 at 11:56 AM Marcelo Vanzin >

Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-28 Thread Dongjoon Hyun
+1 Tested on CentOS 7.4 and Oracle JDK 1.8.0_171. Bests, Dongjoon. On Thu, Jun 28, 2018 at 7:24 AM Takeshi Yamamuro wrote: > +1 > > I run tests on a EC2 m4.2xlarge instance; > [ec2-user]$ java -version > openjdk version "1.8.0_171" > OpenJDK Runtime Environment (build 1.8.0_171-b10) > OpenJDK

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Sean Owen
If it's easy enough to produce them, I agree you can just add them to the RC dir. On Thu, Jun 28, 2018 at 11:56 AM Marcelo Vanzin wrote: > I just noticed this RC is missing builds for hadoop 2.3 and 2.4, which > existed in the previous version: >

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
I just noticed this RC is missing builds for hadoop 2.3 and 2.4, which existed in the previous version: https://dist.apache.org/repos/dist/release/spark/spark-2.1.2/ How important do we think are those? I think I can just build them and publish them to the RC directory without having to create a

Re: Time for 2.3.2?

2018-06-28 Thread Ryan Blue
+1 On Thu, Jun 28, 2018 at 9:34 AM Xiao Li wrote: > +1. Thanks, Saisai! > > The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP. > > Thanks, > > Xiao > > 2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro : > >> +1, I heard some Spark users have skipped v2.3.1 because of these bugs.

Re: Time for 2.3.2?

2018-06-28 Thread Xiao Li
+1. Thanks, Saisai! The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP. Thanks, Xiao 2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro : > +1, I heard some Spark users have skipped v2.3.1 because of these bugs. > > On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang > wrote: > >> +1

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
BTW that would be a great fix in the docs now that we'll have a 2.3.2 being prepared. On Thu, Jun 28, 2018 at 9:17 AM, Felix Cheung wrote: > Exactly... > > > From: Marcelo Vanzin > Sent: Thursday, June 28, 2018 9:16:08 AM > To: Tom Graves > Cc: Felix Cheung; dev

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Felix Cheung
Exactly... From: Marcelo Vanzin Sent: Thursday, June 28, 2018 9:16:08 AM To: Tom Graves Cc: Felix Cheung; dev Subject: Re: [VOTE] Spark 2.1.3 (RC2) Yeah, we should be more careful with that in general. Like we state that "Spark runs on Java 8+"... On Thu, Jun

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin
Yeah, we should be more careful with that in general. Like we state that "Spark runs on Java 8+"... On Thu, Jun 28, 2018 at 9:13 AM, Tom Graves wrote: > Right we say we support R3.1+ but we never actually did, so agree its a bug > but its not a regression since we never really supported them or

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Tom Graves
Right we say we support R3.1+ but we never actually did, so agree its a bug but its not a regression since we never really supported them or tested with them and its not a logic or security bug that ends in corruptions or bad behavior so in my opinion its not a blocker.   Again I'm fine with

Re: Time for 2.3.2?

2018-06-28 Thread Felix Cheung
Yap will do From: Marcelo Vanzin Sent: Thursday, June 28, 2018 9:04:41 AM To: Felix Cheung Cc: Spark dev list Subject: Re: Time for 2.3.2? Could you mark that bug as blocker and set the target version, in that case? On Thu, Jun 28, 2018 at 8:46 AM, Felix Cheung

Re: Time for 2.3.2?

2018-06-28 Thread Marcelo Vanzin
Could you mark that bug as blocker and set the target version, in that case? On Thu, Jun 28, 2018 at 8:46 AM, Felix Cheung wrote: > +1 > > I’d like to fix SPARK-24535 first though > > -- > *From:* Stavros Kontopoulos > *Sent:* Thursday, June 28, 2018 3:50:34 AM >

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Wenchen Fan
SortMergeJoin sorts its children by join key, but broadcast join does not. I think the output ordering of broadcast join has nothing to do with join key. On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido wrote: > I think the outputOrdering would be the one of the big table (if any) and > it wouldn't

Re: Time for 2.3.2?

2018-06-28 Thread Felix Cheung
+1 I’d like to fix SPARK-24535 first though From: Stavros Kontopoulos Sent: Thursday, June 28, 2018 3:50:34 AM To: Marco Gaido Cc: Takeshi Yamamuro; Xingbo Jiang; Wenchen Fan; Spark dev list; Saisai Shao; van...@cloudera.com.invalid Subject: Re: Time for 2.3.2?

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Marco Gaido
I think the outputOrdering would be the one of the big table (if any) and it wouldn't matter if this involves the join keys or not. Am I wrong? 2018-06-28 17:01 GMT+02:00 吴晓菊 : > Thanks for the reply. > By looking into the SortMergeJoinExec, I think we can follow what > SortMergeJoin do, for

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Felix Cheung
Not pushing back, but our support message has always been R 3.1+ so it a bit off to say we don’t support newer releases. https://spark.apache.org/docs/2.1.2/ But looking back, this was found during 2.1.2 RC2 and didn’t fix (in time) for 2.1.2?

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊
Thanks for the reply. By looking into the SortMergeJoinExec, I think we can follow what SortMergeJoin do, for some types of join, if the children is ordered on join keys, we can output the ordered join keys as output ordering. Chrysan Wu 吴晓菊 Phone:+86 17717640807 2018-06-28 22:53 GMT+08:00

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Wenchen Fan
SortMergeJoin only reports ordering of the join keys, not the output ordering of any child. It seems reasonable to me that broadcast join should respect the output ordering of the children. Feel free to submit a PR to fix it, thanks! On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊 wrote: > Why we cannot

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊
Why we cannot use the output order of big table? Chrysan Wu Phone:+86 17717640807 2018-06-28 21:48 GMT+08:00 Marco Gaido : > The easy answer to this is that SortMergeJoin ensure an outputOrdering, > while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you > don't know which

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Tom Graves
If this is just supporting newer versions of R that 2.1 never supported then I would say its not a blocker. But if you feel its useful enough then I would say its up to Marcelo if he wants to pull in and spin another rc. Tom  On Wednesday, June 27, 2018, 8:57:25 PM CDT, Felix Cheung

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Marco Gaido
The easy answer to this is that SortMergeJoin ensure an outputOrdering, while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you don't know which is going to be the order of the output since nothing enforces it. Hope this helps. Thanks. Marco 2018-06-28 15:46 GMT+02:00 吴晓菊 : >

why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊
We see SortMergeJoinExec is implemented with outputPartitioning while BroadcastHashJoinExec is only implemented with outputPartitioning. Why is the design? Chrysan Wu Phone:+86 17717640807

Re: Support SqlStreaming in spark

2018-06-28 Thread JackyLee
Spark JIRA: https://issues.apache.org/jira/projects/SPARK/issues/SPARK-24630 Benefits: Firstly, users, who are unfamiliar with streaming, can easily use SQL to run StructStreaming especially when migrating offline tasks to real time processing tasks. Secondly, support SQL API in StructStreaming

Re: Time for 2.3.2?

2018-06-28 Thread Stavros Kontopoulos
+1 makes sense. On Thu, Jun 28, 2018 at 12:07 PM, Marco Gaido wrote: > +1 too, I'd consider also to include SPARK-24208 if we can solve it > timely... > > 2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro : > >> +1, I heard some Spark users have skipped v2.3.1 because of these bugs. >> >> On Thu, Jun

Re: Time for 2.3.2?

2018-06-28 Thread Marco Gaido
+1 too, I'd consider also to include SPARK-24208 if we can solve it timely... 2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro : > +1, I heard some Spark users have skipped v2.3.1 because of these bugs. > > On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang > wrote: > >> +1 >> >> Wenchen Fan 于2018年6月28日

Re: Time for 2.3.2?

2018-06-28 Thread Takeshi Yamamuro
+1, I heard some Spark users have skipped v2.3.1 because of these bugs. On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang wrote: > +1 > > Wenchen Fan 于2018年6月28日 周四下午2:06写道: > >> Hi Saisai, that's great! please go ahead! >> >> On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao >> wrote: >> >>> +1, like

Re: Time for 2.3.2?

2018-06-28 Thread Xingbo Jiang
+1 Wenchen Fan 于2018年6月28日 周四下午2:06写道: > Hi Saisai, that's great! please go ahead! > > On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao > wrote: > >> +1, like mentioned by Marcelo, these issues seems quite severe. >> >> I can work on the release if short of hands :). >> >> Thanks >> Jerry >> >> >>

Re: Time for 2.3.2?

2018-06-28 Thread Wenchen Fan
Hi Saisai, that's great! please go ahead! On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao wrote: > +1, like mentioned by Marcelo, these issues seems quite severe. > > I can work on the release if short of hands :). > > Thanks > Jerry > > > Marcelo Vanzin 于2018年6月28日周四 上午11:40写道: > >> +1.