Repository: spark
Updated Branches:
  refs/heads/master 360ed832f -> 9fc16a82a


[SPARK-11306] Fix hang when JVM exits.

This commit fixes a bug where, in Standalone mode, if a task fails and crashes 
the JVM, the
failure is considered a "normal failure" (meaning it's considered unrelated to 
the task), so
the failure isn't counted against the task's maximum number of failures:
https://github.com/apache/spark/commit/af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0#diff-a755f3d892ff2506a7aa7db52022d77cL138.
As a result, if a task fails in a way that results in it crashing the JVM, it 
will continuously be
re-launched, resulting in a hang. This commit fixes that problem.

This bug was introduced by #8007; andrewor14 mccheah vanzin can you take a look 
at this?

This error is hard to trigger because we handle executor losses through 2 code 
paths (the second is via Akka, where Akka notices that the executor endpoint is 
disconnected).  In my setup, the Akka code path completes first, and doesn't 
have this bug, so things work fine (see my recent email to the dev list about 
this).  If I manually disable the Akka code path, I can see the hang (and this 
commit fixes the issue).

Author: Kay Ousterhout <kayousterh...@gmail.com>

Closes #9273 from kayousterhout/SPARK-11306.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9fc16a82
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9fc16a82
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9fc16a82

Branch: refs/heads/master
Commit: 9fc16a82adb5f3db2a250765c11393794404a51b
Parents: 360ed83
Author: Kay Ousterhout <kayousterh...@gmail.com>
Authored: Tue Oct 27 10:46:43 2015 -0700
Committer: Kay Ousterhout <kayousterh...@gmail.com>
Committed: Tue Oct 27 10:46:43 2015 -0700

----------------------------------------------------------------------
 .../spark/scheduler/cluster/SparkDeploySchedulerBackend.scala      | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/9fc16a82/core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
----------------------------------------------------------------------
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
 
b/core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
index 2625c3e..a4214c4 100644
--- 
a/core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
+++ 
b/core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
@@ -137,7 +137,7 @@ private[spark] class SparkDeploySchedulerBackend(
 
   override def executorRemoved(fullId: String, message: String, exitStatus: 
Option[Int]) {
     val reason: ExecutorLossReason = exitStatus match {
-      case Some(code) => ExecutorExited(code, isNormalExit = true, message)
+      case Some(code) => ExecutorExited(code, isNormalExit = false, message)
       case None => SlaveLost(message)
     }
     logInfo("Executor %s removed: %s".format(fullId, message))


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to