[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-16 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-15 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/18874 @srowen you have a good point about a case that becomes worse after this change. Still I think this change is better on balance. btw, there are more even more odd cases with dynamic

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80645/ Test PASSed. ---

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18874 **[Test build #80645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80645/testReport)** for PR 18874 at commit

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18874 **[Test build #80645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80645/testReport)** for PR 18874 at commit

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-11 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 The minimum count is still needed, its needed between stages when the number of tasks goes below the minimum count. Its either going to keep minimum number of executors or enough executors to

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-10 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18874 Seems not-unreasonable to me given the current problem statement. It does solve the possible problem about 0 executors, and then some. The possible impact to a normal app is like: run a

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-10 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/18874 This change makes sense to me. Tom's last comment about resetting that timeout every time one task is scheduled I think explains how you get in this situation and why you don't actually

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-10 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 so I think the issue with the locality is that it resets the time (3s wait) whenever it schedules any task at the particular locality level (in this case node local) on any node. So it can take

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-10 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18874 I think the fix makes sense; the part that is not clear is why this is happening, since the default locality timeout is 3s and the default executor idle timeout is 60s, so they really shouldn't

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-10 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 Also note that I would like to investigating making the locality logic in the scheduler better as I don't think it should take 60+ seconds for it to fall back to use a node for rack local.

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-10 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 To answer a few of your last questions. It doesn't hurt the common case, the common case is all your executors have tasks on them as long as there are tasks to run. Normally scheduler can

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-10 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 I've updated the description in https://issues.apache.org/jira/browse/SPARK-21656 to join all my comments here together, hopefully that clarifies it. --- If your project is set up for it, you

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-10 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18874 @tgravescs that's actually progress. You're no longer saying that the goal is to keep a few executors around just in case (https://issues.apache.org/jira/browse/SPARK-21656) or that the problem is

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-10 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18874 I think the current fix is a feasible and simple solution for the scenarios mentioned above. As far as I understand from the comments above, ideally this problem should not be happened, but in a

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-09 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 I suggest you go understand the code. I've already explained this multiple times. You get 0 executors by there being delays when an executors doesn't have a task scheduled. say you

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-09 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18874 How do you reach 0 executors when there is still a task to schedule? That if anything is the bug, but it isn't what's contemplated here, so, confused. I disagree, the rest of your scenarios

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-09 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 I'm saying you have a stage running that has > 0 tasks to run. If dynamic allocation has already got all the executors it originally thought it needed and they all idle timeout then you have 0

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-09 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18874 Why is 0 executors a 'deadlock'? if there is no work to do, 0 executors is fine. If there is work to do, of course, at least 1 executor should not time out. Is that what you're claiming happens?

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-09 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 There is nothing in the code stopping your from you idle timeouting all of your executors.. thus executors are 0 and you deadlock. 0 executors = deadlock = definite bug. We definitely

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-09 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18874 Going to 0 executors is not a bug, if you set the min to 0. A deadlock is a bug. But, nothing in the JIRA or here suggests there's a deadlock -- what do you mean? --- If your project is set up

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-09 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 is going to 0 executors and allowing a deadlock a bug? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-09 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18874 That is correct behavior, as defined by the idle timeout and the min number of executors, which are already configured. I do not understand why going to the small number that the config explicitly

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-09 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 the bug is that with idle timeout the number of executors can go to a very small number, even zero and we never look back to make sure that doesn't happen. --- If your project is set up for it,

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-09 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18874 Doesn't this make the 'target' effectively the minimum? As I say on the JIRA I still do not see a behavior that needs fixing here. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-09 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 @yoonlee95 please update with unit tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18874 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80360/ Test FAILed. ---

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18874 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18874 **[Test build #80360 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80360/testReport)** for PR 18874 at commit

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18874 **[Test build #80360 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80360/testReport)** for PR 18874 at commit

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-07 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18874 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #18874: [SPARK-21656][CORE] spark dynamic allocation should not ...

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18874 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this