[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-09-03 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-137578046 I am closing it for now. Will reopen it when I get a chance to work on it. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-09-03 Thread yhuai
Github user yhuai closed the pull request at: https://github.com/apache/spark/pull/7886 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-07 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/7886#discussion_r36500358 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -170,14 +170,57 @@ class PlannerSuite extends SparkFunSuite

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-07 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/7886#discussion_r36500113 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashOuterJoin.scala --- @@ -42,12 +42,23 @@ case class

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-07 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/7886#discussion_r36534848 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -170,14 +170,57 @@ class PlannerSuite extends SparkFunSuite

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127280985 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127281010 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127282231 [Test build #39553 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39553/consoleFull) for PR 7886 at commit

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127135250 [Test build #39520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39520/consoleFull) for PR 7886 at commit

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127135112 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127135103 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread yhuai
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/7886 [SPARK-7871] [SQL] Improve the outputPartitioning for outer joins. https://issues.apache.org/jira/browse/SPARK-7871 This PR adds the concept of `nullSafe` to `ClusteredDistribution` and

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127151587 [Test build #39520 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39520/console) for PR 7886 at commit

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127151697 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127338471 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127338200 [Test build #39553 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39553/console) for PR 7886 at commit

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127402132 I'd like to try to review this now since I think it's going to conflict with the SMJ outer join patch. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127402968 One high-level comment: unless I've overlooked it, there doesn't seem to be any documentation in the code to explain what the `nullSafe` concept means here, although

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127403525 Expression's use of `nullSafe` seems to be safe due to absence of nulls, whereas this patch seems to use it as safe to receive nulls / shuffle nulls. --- If your

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-08-03 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/7886#issuecomment-127405648 Actually I'm going to drop review of this for now and focus on pulling in SMJ first. That will conflict with this patch but we can remember to update SMJ's

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-28 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-106337344 sorry, it will causes performance regression for case like `left join a.key=b.key group by a.key`, will figure out how to fix it soon. --- If your

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105964846 [Test build #33593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33593/consoleFull) for PR 6413 at commit

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105964199 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105964160 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105945062 [Test build #33589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33589/consoleFull) for PR 6413 at commit

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105944470 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105945512 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105945503 [Test build #33589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33589/consoleFull) for PR 6413 at commit

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105945516 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105944510 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-106007175 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-106007176 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-106007156 [Test build #33593 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33593/consoleFull) for PR 6413 at commit

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread jeanlyn
Github user jeanlyn commented on a diff in the pull request: https://github.com/apache/spark/pull/6413#discussion_r31158220 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -32,6 +32,26 @@ import org.apache.spark.sql.types._

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/6413#discussion_r31193985 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala --- @@ -32,6 +32,26 @@ import org.apache.spark.sql.types._

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/6413#discussion_r31199761 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -49,8 +49,19 @@ case object AllTuples

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-27 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/6413#discussion_r3119 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -49,8 +49,19 @@ case object AllTuples extends

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105570397 [Test build #33519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33519/consoleFull) for PR 6413 at commit

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105570411 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105570409 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105562584 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105562503 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-26 Thread chenghao-intel
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/6413 [SPARK-7871] [SQL] Improve the outputPartitioning for HashOuterJoin https://issues.apache.org/jira/browse/SPARK-7871 You can merge this pull request into a Git repository by running: $

[GitHub] spark pull request: [SPARK-7871] [SQL] Improve the outputPartition...

2015-05-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-105563789 [Test build #33519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33519/consoleFull) for PR 6413 at commit