[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-05-05 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/11205#issuecomment-217158802 sorry somehow I missed this go by, I haven't looked at the code chanes in detail yet. The TaskEnd event should be being sent all the time now, we fixed this bug a w

[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-05-04 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/11205#issuecomment-217052874 Also cc @vanzin @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-02-19 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/11205#issuecomment-186486879 Hi @andrewor14 , thanks a lot for your comments. The reason why I introduce another data structure to track each executor's stage and task numbers is mention

[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-02-19 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/11205#issuecomment-186368937 @jerryshao I took a look at this and it looks overly complicated. It seems that the problem is sometimes we have negative `totalRunningTasks` and that leads to undes

[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-02-19 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/11205#discussion_r53505158 --- Diff: core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala --- @@ -890,6 +890,43 @@ class ExecutorAllocationManagerSuite

[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-02-19 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/11205#discussion_r53505109 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -540,9 +540,11 @@ private[spark] class ExecutorAllocationManager(

[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-02-19 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/11205#discussion_r53498775 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -540,9 +540,11 @@ private[spark] class ExecutorAllocationManager(

[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-02-19 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/11205#discussion_r53498726 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -540,9 +540,11 @@ private[spark] class ExecutorAllocationManager(

[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-02-19 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/11205#issuecomment-186352005 Just to add a link here to the previous PR #9288 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If y

[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11205#issuecomment-184122268 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11205#issuecomment-184122265 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-02-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11205#issuecomment-184121750 **[Test build #51298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51298/consoleFull)** for PR 11205 at commit [`966eb89`](https://g

[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-02-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11205#issuecomment-184088639 **[Test build #51298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51298/consoleFull)** for PR 11205 at commit [`966eb89`](https://gi

[GitHub] spark pull request: [SPARK-11334][Core] Handle maximum task failur...

2016-02-14 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/11205 [SPARK-11334][Core] Handle maximum task failure situation in dynamic allocation Currently there're two problems in dynamic allocation when maximum task failure is met: 1. Number of runn