[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-91359531 @yhuai, thanks for the comments, for your last comment, I am wondering if we can add `Seq[SortOder]` as the parameter for `Partitioning` and `Distribution`? instea

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28103653 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -178,6 +179,7 @@ class StatisticsSuite extends QueryTest with

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-91339237 Actually, instead of introducing new `Distribution` and `Partitioning`, how about we add the following two concepts to a `SparkPlan`. * `requiredPartitionOrdering:

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread adrian-wang
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-91338697 @yhuai Thanks for the review! Actually I have passed jenkins when I use default value for autoSortMergeJoin as True and then set it to false here. And I agree tha

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28095095 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -144,6 +144,7 @@ class StatisticsSuite extends QueryTest with Be

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28090938 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoin.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed to the Apach

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-91324619 @adrian-wang This is a really helpful improvement! Besides my comments in the code, I have two general comments. First, for this PR, it is fine to add unnecessary `Exchange

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28089849 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoin.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Softwar

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28089723 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoin.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Softwar

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28084180 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -163,6 +178,40 @@ case class HashPartitioning(expr

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28082508 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -163,6 +178,40 @@ case class HashPartitioning(expr

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28081373 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -163,6 +178,40 @@ case class HashPartitioning(expr

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28081054 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -178,6 +179,7 @@ class StatisticsSuite extends QueryTest with BeforeAn

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-09 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28080961 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -144,6 +144,7 @@ class StatisticsSuite extends QueryTest with BeforeAn

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88839067 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-02 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88839061 [Test build #29597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29597/consoleFull) for PR 5208 at commit [`b81f0fe`](https://gith

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-02 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88803199 [Test build #29597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29597/consoleFull) for PR 5208 at commit [`b81f0fe`](https://githu

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread adrian-wang
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88776457 Since the behavior is undefined in Scala, I think it is ok to return anything, as I stated in comment. --- If your project is set up for it, you can reply to this em

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r27635442 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoin.scala --- @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache S

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r27635315 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -144,6 +145,12 @@ private[sql] class SQLConf extends Serializable { ge

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r27635297 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -75,9 +76,9 @@ class JoinSuite extends QueryTest with BeforeAndAfterEach {

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88772266 @adrian-wang I leave some comments, but I need some more time in review the code of `SortMergeJoin`, will keep add more comments later. BTW, can you double check i

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r27635035 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoin.scala --- @@ -0,0 +1,164 @@ +/* + * Licensed to the Apach

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r27634876 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -75,9 +76,9 @@ class JoinSuite extends QueryTest with BeforeAndAfterEach

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r27634669 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -144,6 +145,12 @@ private[sql] class SQLConf extends Serializable {

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread adrian-wang
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88735424 cc @marmbrus @liancheng @yhuai @chenghao-intel --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If you

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88732886 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88732869 [Test build #29584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29584/consoleFull) for PR 5208 at commit [`7a869c5`](https://gith

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88712992 [Test build #29584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29584/consoleFull) for PR 5208 at commit [`7a869c5`](https://githu

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88529047 From the log, seems the output fields of the `PhysicalRDD` changed its order, can you rebase against the latest code and try again in your local? ``` =

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread adrian-wang
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88530258 yes, after rebase i can see this exception --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pro

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread adrian-wang
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88410873 This exception only exists on current master, I didn't get this locally because I was working on a March-26 master. This could be a potential bug we introduced during

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread adrian-wang
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88403702 I am not getting this error locally... what's wrong? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. I

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88399749 [Test build #29533 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29533/consoleFull) for PR 5208 at commit [`f5f81db`](https://gith

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88399763 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88394118 [Test build #29533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29533/consoleFull) for PR 5208 at commit [`f5f81db`](https://githu

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88387039 [Test build #29532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29532/consoleFull) for PR 5208 at commit [`c34c96e`](https://gith

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88387055 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88384073 [Test build #29532 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29532/consoleFull) for PR 5208 at commit [`c34c96e`](https://githu

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88380951 [Test build #29530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29530/consoleFull) for PR 5208 at commit [`d7bfe07`](https://gith

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88380959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-04-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-88380636 [Test build #29530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29530/consoleFull) for PR 5208 at commit [`d7bfe07`](https://githu

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-03-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-87549463 [Test build #29383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29383/consoleFull) for PR 5208 at commit [`6df9f01`](https://gith

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-87549480 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-03-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-87538319 [Test build #29383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29383/consoleFull) for PR 5208 at commit [`6df9f01`](https://githu

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-03-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-87536978 [Test build #29382 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29382/consoleFull) for PR 5208 at commit [`cb1e18d`](https://gith

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-03-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-87536979 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-03-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-87536885 [Test build #29382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29382/consoleFull) for PR 5208 at commit [`cb1e18d`](https://githu

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-03-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-86443175 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-03-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-86443152 [Test build #29224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29224/consoleFull) for PR 5208 at commit [`b87df90`](https://gith

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-03-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-86433992 [Test build #29224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29224/consoleFull) for PR 5208 at commit [`b87df90`](https://githu

[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...

2015-03-26 Thread adrian-wang
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/5208 [SPARK-2213] [SQL] sort merge join for spark sql Thanks for the initial work from @Ishiihara in #3173 You can merge this pull request into a Git repository by running: $ git pull https://g

<    1   2