Github user scwf commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-76318262
sorry for delay, my initial idea here is
1 we can set spark.default.parallsim to control the partitions num for
shuffle but this config option do not sensitive to data s
Github user scwf closed the pull request at:
https://github.com/apache/spark/pull/3694
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enable
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-76069557
I agree. @scwf would you mind closing this issue?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If yo
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-75936767
I suggest we close this as I see arguments against, and no replies to those
and/or the motivation for this change.
---
If your project is set up for it, you can reply to
Github user lianhuiwang commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-75903702
i do not think that a global default ratio is right. because in a job the
size of each stage is different and they are not Increasing or decreasing. if
we define a p
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-75584312
@srowen good point. I think a ratio argument is prettier than an
expression, but arguably not enough to warrant clogging up the API.
---
If your project is set up for it,
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-75581853
You can implement this by expressing parallelism as a function of the
parent RDD right? yeah you have to write the expression but does an alternative
multiplier arg do muc
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-75580971
In general, a fixed number of partitions is very difficult to work with
when configuring a shuffle. Suppose I have a job where I know a `flatMap` is
going to blow up the s
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-75563929
I am also not clear this is a good thing. As a default, it doesn't change
anything. There is probably not a globally correct ratio, even if it's not 1,
but this implies th
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-75145453
Hi @scwf can you elaborate on the motivation for this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-67102568
[Test build #24474 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24474/consoleFull)
for PR 3694 at commit
[`f21bfd4`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-67102573
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-67095615
[Test build #24474 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24474/consoleFull)
for PR 3694 at commit
[`f21bfd4`](https://githu
Github user scwf commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-67095261
Jekins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have thi
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-66956294
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-66956286
[Test build #24450 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24450/consoleFull)
for PR 3694 at commit
[`f21bfd4`](https://gith
Github user scwf commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-66956061
Hmm, seems there are some problems with
```org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDDSuite```, and i
noticed that other PRs also failed there.
---
If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-66952871
[Test build #24450 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24450/consoleFull)
for PR 3694 at commit
[`f21bfd4`](https://githu
Github user scwf commented on the pull request:
https://github.com/apache/spark/pull/3694#issuecomment-66952763
Jekins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have thi
19 matches
Mail list logo