[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-19 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132799456 @shivaram did you create a JIRA for making this affect only ShuffledRDD? I might do it as part of https://issues.apache.org/jira/browse/SPARK-9852, which I'm working on

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-19 Thread shivaram
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132804519 Not yet - I was hoping to keep SPARK-10087 open, but I guess thats closed now. Doing it as a part of SPARK-9852 sounds good to me. Let me know if you want me to review

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread shivaram
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132322822 cc @mateiz who has also been looking at this code recently --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread shivaram
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132320966 Could you provide some more information about the map output ? The reducer locality should not kick in unless a certain map output location has more than 20% of the

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread shivaram
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132325480 Thanks for the info -- And just to confirm, is everything getting assigned to Executor ID 23 (10.0.145.27) in the reduce stage ? --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132320481 cc @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132324006 The reduce stage i has a 2-way join in it. The two map stages had 30 and 1 tasks, respectively. For the stage having 30 tasks, here is the screenshot of task info

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132328891 ah, sorry i missed the reducer stage's screenshot. Yes, executor 23 was the one got all reduce tasks.

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread shivaram
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132331662 So my hypothesis right now is that the RDD in the reduce stage has two Shuffle dependencies and the first shuffle dependency happens to be the single map task stage --

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132346850 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132346765 [Test build #41149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41149/console) for PR 8280 at commit

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread shivaram
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132345128 The diff I'm proposing is something like ``` +val numShuffleDeps = rdd.dependencies.filter(_.isInstanceOf[ShuffleDependency[_, _, _]]).length +

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread shivaram
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132415909 Ok - lets leave it on in master and I'll work with @mateiz on changes to move this to ShuffleRDD and capture more use cases. @yhuai could you put in the query you ran

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132415200 Let's close this one. @shivaram can you submit a proper fix for master? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread yhuai
Github user yhuai closed the pull request at: https://github.com/apache/spark/pull/8280 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132417459 @shivaram Sure. Just updated the JIRA description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132346849 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132401226 Why don't we turn it on in master but off in 1.5? At this point in the 1.5 cycle, I'm worry about potential bugs this would cause after more fixes. --- If your project

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132414515 I created https://github.com/apache/spark/pull/8296 to change the default setting to false for branch 1.5. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132394991 It does sound good to turn it off if there are multiple dependencies. However, an even better solution may be to move this into ShuffledRDD, so that we control where

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132408576 Sorry just too risky right now for 1.5. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132395677 BTW it may also be fine to turn it off by default for 1.5, but in general, with these things, there's not much point having them in the code if they're off by default.

[GitHub] spark pull request: [SPARK-10087] [CORE] Disable spark.shuffle.red...

2015-08-18 Thread shivaram
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8280#issuecomment-132403639 But to Matei's point we don't get feedback if its on in the master branch as I guess many more people use a release. I think turning it off for the multiple dependency