[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/10376#discussion_r49664329 --- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriterSuite.scala --- @@ -169,6 +170,41 @@ class

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-171463572 Actually, upon closer inspection I think that my suggestion might not work because of the lifecycle of when these methods are called. Therefore I'm now inclined to

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10376 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-171453339 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-171460336 This looks good to me overall. I have a couple of suggestions for how we might simplify the test case; please take a look at my comments and let me know whether you

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-171478325 **[Test build #49334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49334/consoleFull)** for PR 10376 at commit

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-171478514 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/10376#discussion_r49662046 --- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriterSuite.scala --- @@ -169,6 +170,41 @@ class

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/10376#discussion_r49662078 --- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriterSuite.scala --- @@ -104,7 +104,7 @@ class

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread jerryshao
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/10376#discussion_r49662917 --- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriterSuite.scala --- @@ -104,7 +104,7 @@ class

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-171483932 Merging this into master (2.0.0). Thanks @jerryshao! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-171456954 **[Test build #49334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49334/consoleFull)** for PR 10376 at commit

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/10376#discussion_r49664103 --- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriterSuite.scala --- @@ -104,7 +104,7 @@ class

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-171478512 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-13 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-171461821 Thanks a lot @JoshRosen for your comments, I will try to simplify the test if possible. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-06 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-169541359 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-169563738 **[Test build #48892 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48892/consoleFull)** for PR 10376 at commit

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-169563864 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-169542967 **[Test build #48892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48892/consoleFull)** for PR 10376 at commit

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-169215312 **[Test build #48809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48809/consoleFull)** for PR 10376 at commit

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-169215385 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-169215387 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-05 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-169198730 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2016-01-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-169200095 **[Test build #48809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48809/consoleFull)** for PR 10376 at commit

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-168112545 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-168112524 **[Test build #48526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48526/consoleFull)** for PR 10376 at commit

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-168112546 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-168099236 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-168099215 **[Test build #48510 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48510/consoleFull)** for PR 10376 at commit

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-168099235 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-30 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-168100863 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-30 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-168080367 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-168080843 **[Test build #48510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48510/consoleFull)** for PR 10376 at commit

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-168101248 **[Test build #48526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48526/consoleFull)** for PR 10376 at commit

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-20 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-166170360 Hi @JoshRosen , from performance point I don't think there's a big difference with this patch, since at most we will only open `200 * Cores` number of files

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-18 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/10376 [SPARK-12400][Shuffle] Avoid generating temp shuffle files for empty partitions This problem lies in `BypassMergeSortShuffleWriter`, empty partition will also generate a temp shuffle file with

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-165744677 **[Test build #47999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47999/consoleFull)** for PR 10376 at commit

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-165766313 **[Test build #47999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47999/consoleFull)** for PR 10376 at commit

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-165766403 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-165766405 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-18 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-165893735 Hey, just curious: did this result in any perf. improvements? I've considered this change a couple of times but in my own benchmarking work it didn't seem to make a

[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...

2015-12-18 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-165898712 This reduces the number of files when using a very large number of reducers with little data. Good to do unless there are major risks. --- If your project is set up for