[ https://issues.apache.org/jira/browse/SPARK-30310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean R. Owen resolved SPARK-30310. ---------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26955 [https://github.com/apache/spark/pull/26955] > SparkUncaughtExceptionHandler halts running process unexpectedly > ---------------------------------------------------------------- > > Key: SPARK-30310 > URL: https://issues.apache.org/jira/browse/SPARK-30310 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.0, 3.0.0 > Reporter: Tin Hang To > Assignee: Tin Hang To > Priority: Major > Fix For: 3.0.0 > > > During 2.4.x testing, we have many occasions where the Worker process would > just DEAD unexpectedly, with the Worker log ends with: > > {{ERROR SparkUncaughtExceptionHandler: scala.MatchError: <...callstack...>}} > > We get the same callstack during our 2.3.x testing but the Worker process > stays up. > Upon looking at the 2.4.x SparkUncaughtExceptionHandler.scala compared to the > 2.3.x version, we found out SPARK-24294 introduced the following change: > {{exception catch {}} > {{ case _: OutOfMemoryError =>}} > {{ System.exit(SparkExitCode.OOM)}} > {{ case e: SparkFatalException if e.throwable.isInstanceOf[OutOfMemoryError] > =>}} > {{ // SPARK-24294: This is defensive code, in case that > SparkFatalException is}} > {{ // misused and uncaught.}} > {{ System.exit(SparkExitCode.OOM)}} > {{ case _ if exitOnUncaughtException =>}} > {{ System.exit(SparkExitCode.UNCAUGHT_EXCEPTION)}} > {{}}} > > This code has the _ if exitOnUncaughtException case, but not the other _ > cases. As a result, when exitOnUncaughtException is false (Master and > Worker) and exception doesn't match any of the match cases (e.g., > IllegalStateException), Scala throws MatchError(exception) ("MatchError" > wrapper of the original exception). Then the other catch block down below > thinks we have another uncaught exception, and halts the entire process with > SparkExitCode.UNCAUGHT_EXCEPTION_TWICE. > > {{catch {}} > {{ case oom: OutOfMemoryError => Runtime.getRuntime.halt(SparkExitCode.OOM)}} > {{ case t: Throwable => > Runtime.getRuntime.halt(SparkExitCode.UNCAUGHT_EXCEPTION_TWICE)}} > {{}}} > > Therefore, even when exitOnUncaughtException is false, the process will halt. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org