[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56617808 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -134,6 +176,18 @@ private[sql] abstract class SparkStrategies exte

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-20 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198197316 @rxin Shuffle hash join should be turned using spark.sql.autoBroadcastThreshold, should we create another config for similar purpose? --- If your project is set up for

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-20 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198070120 **[Test build #53451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53451/consoleFull)** for PR 11788 at commit [`6385777`](https://g

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198002057 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56607971 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/LeftSemiJoinHash.scala --- @@ -1,58 +0,0 @@ -/* - * Licensed to the Apache Softw

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56616679 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/LeftSemiJoinHash.scala --- @@ -1,58 +0,0 @@ -/* - * Licensed to the Apache Softw

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56617782 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoin.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Softw

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198070544 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56620669 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -118,6 +148,18 @@ private[sql] abstract class SparkStrategies ex

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198226134 **[Test build #53513 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53513/consoleFull)** for PR 11788 at commit [`dea3615`](https://gi

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198002032 **[Test build #53441 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53441/consoleFull)** for PR 11788 at commit [`f621725`](https://g

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198010601 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198025367 **[Test build #53450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53450/consoleFull)** for PR 11788 at commit [`797fe2d`](https://gi

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-197995227 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198026929 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198032509 **[Test build #53451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53451/consoleFull)** for PR 11788 at commit [`6385777`](https://gi

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198266969 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56696068 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -101,10 +100,41 @@ private[sql] abstract class SparkStrategies ex

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198001353 **[Test build #53441 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53441/consoleFull)** for PR 11788 at commit [`f621725`](https://gi

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56617839 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -134,6 +176,18 @@ private[sql] abstract class SparkStrategies exte

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-197999654 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-197997856 **[Test build #53439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53439/consoleFull)** for PR 11788 at commit [`3b6234a`](https://g

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56617629 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -101,10 +100,41 @@ private[sql] abstract class SparkStrategies ext

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-197997871 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56548669 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala --- @@ -247,7 +247,26 @@ class BenchmarkWholeStageCodegen e

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11788 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198198599 I was thinking an option to prefer sort merge join. My main concern is that it is easier to oom when there are skew in hash joins. --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56616530 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/LeftSemiJoinHash.scala --- @@ -1,58 +0,0 @@ -/* - * Licensed to the Apache Sof

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198230010 lgtm - please address the minor issues in ur next pr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If y

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198177686 Can we add a config option to allow turning off shuffle hash join? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub a

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198070539 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198026931 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198463977 Merging this into master, those minor comments will be addressed in follow-up PRs. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-197997300 **[Test build #53439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53439/consoleFull)** for PR 11788 at commit [`3b6234a`](https://gi

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-197994373 **[Test build #53437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53437/consoleFull)** for PR 11788 at commit [`94c4004`](https://gi

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-197999084 **[Test build #53440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53440/consoleFull)** for PR 11788 at commit [`f771ec9`](https://gi

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198026914 **[Test build #53450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53450/consoleFull)** for PR 11788 at commit [`797fe2d`](https://g

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-197999659 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-197997865 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198266972 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198205772 spark.sql.join.preferSortMergeJoin = true ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proje

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-197999643 **[Test build #53440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53440/consoleFull)** for PR 11788 at commit [`f771ec9`](https://g

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56618062 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -118,6 +148,18 @@ private[sql] abstract class SparkStrategies exte

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198010590 **[Test build #53444 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53444/consoleFull)** for PR 11788 at commit [`2122986`](https://g

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198009917 **[Test build #53444 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53444/consoleFull)** for PR 11788 at commit [`2122986`](https://gi

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-197995219 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198010606 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-18 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/11788 [SPARK-13977] [SQL] Brings back Shuffled hash join ## What changes were proposed in this pull request? ShuffledHashJoin (also outer join) is removed in 1.6, in favor of SortMergeJoin, which

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198266745 **[Test build #53513 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53513/consoleFull)** for PR 11788 at commit [`dea3615`](https://g

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-18 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56548582 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -263,32 +263,20 @@ class SQLMetricsSuite extends SparkFu

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-18 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198205940 sgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-18 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56698309 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -101,10 +100,41 @@ private[sql] abstract class SparkStrategies e

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198002047 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-197995203 **[Test build #53437 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53437/consoleFull)** for PR 11788 at commit [`94c4004`](https://g

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56620771 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -118,6 +148,18 @@ private[sql] abstract class SparkStrategies exte