[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2015-02-26 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-76318262 sorry for delay, my initial idea here is 1 we can set spark.default.parallsim to control the partitions num for shuffle but this config option do not sensitive to data s

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2015-02-26 Thread scwf
Github user scwf closed the pull request at: https://github.com/apache/spark/pull/3694 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enable

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-76069557 I agree. @scwf would you mind closing this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If yo

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2015-02-25 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-75936767 I suggest we close this as I see arguments against, and no replies to those and/or the motivation for this change. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2015-02-24 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-75903702 i do not think that a global default ratio is right. because in a job the size of each stage is different and they are not Increasing or decreasing. if we define a p

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2015-02-23 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-75584312 @srowen good point. I think a ratio argument is prettier than an expression, but arguably not enough to warrant clogging up the API. --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2015-02-23 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-75581853 You can implement this by expressing parallelism as a function of the parent RDD right? yeah you have to write the expression but does an alternative multiplier arg do muc

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2015-02-23 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-75580971 In general, a fixed number of partitions is very difficult to work with when configuring a shuffle. Suppose I have a job where I know a `flatMap` is going to blow up the s

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2015-02-23 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-75563929 I am also not clear this is a good thing. As a default, it doesn't change anything. There is probably not a globally correct ratio, even if it's not 1, but this implies th

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2015-02-19 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-75145453 Hi @scwf can you elaborate on the motivation for this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2014-12-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-67102568 [Test build #24474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24474/consoleFull) for PR 3694 at commit [`f21bfd4`](https://gith

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2014-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-67102573 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2014-12-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-67095615 [Test build #24474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24474/consoleFull) for PR 3694 at commit [`f21bfd4`](https://githu

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2014-12-15 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-67095261 Jekins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have thi

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2014-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-66956294 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2014-12-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-66956286 [Test build #24450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24450/consoleFull) for PR 3694 at commit [`f21bfd4`](https://gith

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2014-12-14 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-66956061 Hmm, seems there are some problems with ```org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDDSuite```, and i noticed that other PRs also failed there. --- If your

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2014-12-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-66952871 [Test build #24450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24450/consoleFull) for PR 3694 at commit [`f21bfd4`](https://githu

[GitHub] spark pull request: [SPARK-4845][Core] Adding a parallelismRatio t...

2014-12-14 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3694#issuecomment-66952763 Jekins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have thi