Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-91359531
@yhuai, thanks for the comments, for your last comment, I am wondering if
we can add `Seq[SortOder]` as the parameter for `Partitioning` and
`Distribution`? instea
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r28103653
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -178,6 +179,7 @@ class StatisticsSuite extends QueryTest with
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-91339237
Actually, instead of introducing new `Distribution` and `Partitioning`, how
about we add the following two concepts to a `SparkPlan`.
* `requiredPartitionOrdering:
Github user adrian-wang commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-91338697
@yhuai Thanks for the review!
Actually I have passed jenkins when I use default value for
autoSortMergeJoin as True and then set it to false here. And I agree tha
Github user adrian-wang commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r28095095
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -144,6 +144,7 @@ class StatisticsSuite extends QueryTest with
Be
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r28090938
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoin.scala
---
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apach
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-91324619
@adrian-wang This is a really helpful improvement! Besides my comments in
the code, I have two general comments. First, for this PR, it is fine to add
unnecessary `Exchange
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r28089849
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoin.scala
---
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Softwar
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r28089723
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoin.scala
---
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Softwar
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r28084180
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
---
@@ -163,6 +178,40 @@ case class HashPartitioning(expr
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r28082508
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
---
@@ -163,6 +178,40 @@ case class HashPartitioning(expr
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r28081373
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
---
@@ -163,6 +178,40 @@ case class HashPartitioning(expr
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r28081054
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -178,6 +179,7 @@ class StatisticsSuite extends QueryTest with
BeforeAn
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r28080961
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -144,6 +144,7 @@ class StatisticsSuite extends QueryTest with
BeforeAn
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88839067
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88839061
[Test build #29597 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29597/consoleFull)
for PR 5208 at commit
[`b81f0fe`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88803199
[Test build #29597 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29597/consoleFull)
for PR 5208 at commit
[`b81f0fe`](https://githu
Github user adrian-wang commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88776457
Since the behavior is undefined in Scala, I think it is ok to return
anything, as I stated in comment.
---
If your project is set up for it, you can reply to this em
Github user adrian-wang commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r27635442
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoin.scala
---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache S
Github user adrian-wang commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r27635315
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -144,6 +145,12 @@ private[sql] class SQLConf extends Serializable {
ge
Github user adrian-wang commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r27635297
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
@@ -75,9 +76,9 @@ class JoinSuite extends QueryTest with BeforeAndAfterEach
{
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88772266
@adrian-wang I leave some comments, but I need some more time in review the
code of `SortMergeJoin`, will keep add more comments later. BTW, can you double
check i
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r27635035
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoin.scala
---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apach
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r27634876
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
@@ -75,9 +76,9 @@ class JoinSuite extends QueryTest with BeforeAndAfterEach
Github user chenghao-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/5208#discussion_r27634669
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -144,6 +145,12 @@ private[sql] class SQLConf extends Serializable {
Github user adrian-wang commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88735424
cc @marmbrus @liancheng @yhuai @chenghao-intel
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If you
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88732886
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88732869
[Test build #29584 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29584/consoleFull)
for PR 5208 at commit
[`7a869c5`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88712992
[Test build #29584 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29584/consoleFull)
for PR 5208 at commit
[`7a869c5`](https://githu
Github user chenghao-intel commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88529047
From the log, seems the output fields of the `PhysicalRDD` changed its
order, can you rebase against the latest code and try again in your local?
```
=
Github user adrian-wang commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88530258
yes, after rebase i can see this exception
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
Github user adrian-wang commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88410873
This exception only exists on current master, I didn't get this locally
because I was working on a March-26 master. This could be a potential bug we
introduced during
Github user adrian-wang commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88403702
I am not getting this error locally... what's wrong?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. I
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88399749
[Test build #29533 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29533/consoleFull)
for PR 5208 at commit
[`f5f81db`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88399763
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88394118
[Test build #29533 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29533/consoleFull)
for PR 5208 at commit
[`f5f81db`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88387039
[Test build #29532 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29532/consoleFull)
for PR 5208 at commit
[`c34c96e`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88387055
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88384073
[Test build #29532 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29532/consoleFull)
for PR 5208 at commit
[`c34c96e`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88380951
[Test build #29530 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29530/consoleFull)
for PR 5208 at commit
[`d7bfe07`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88380959
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-88380636
[Test build #29530 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29530/consoleFull)
for PR 5208 at commit
[`d7bfe07`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-87549463
[Test build #29383 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29383/consoleFull)
for PR 5208 at commit
[`6df9f01`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-87549480
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-87538319
[Test build #29383 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29383/consoleFull)
for PR 5208 at commit
[`6df9f01`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-87536978
[Test build #29382 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29382/consoleFull)
for PR 5208 at commit
[`cb1e18d`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-87536979
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-87536885
[Test build #29382 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29382/consoleFull)
for PR 5208 at commit
[`cb1e18d`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-86443175
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-86443152
[Test build #29224 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29224/consoleFull)
for PR 5208 at commit
[`b87df90`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5208#issuecomment-86433992
[Test build #29224 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29224/consoleFull)
for PR 5208 at commit
[`b87df90`](https://githu
GitHub user adrian-wang opened a pull request:
https://github.com/apache/spark/pull/5208
[SPARK-2213] [SQL] sort merge join for spark sql
Thanks for the initial work from @Ishiihara in #3173
You can merge this pull request into a Git repository by running:
$ git pull https://g
101 - 152 of 152 matches
Mail list logo