Hello,

I had a question about error handling in Spark job: if an exception occurs
during the job, what is the best way to get notification of the failure?
Can Spark jobs return with different exit codes?

For example, I wrote a dummy Spark job just throwing out an Exception, as
follows:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object ExampleJob {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("Test Job")
    val sc = new SparkContext(conf)
    try {
      val count = sc.parallelize(1 to 100).count
      println(s"Count: $count")

      *throw new Exception("Fail!")*

    } finally {
      sc.stop
    }
  }

}

The spark-submit execution trace shows the error:
spark-submit --class com.test.ExampleJob test.jar
15/10/03 03:13:16 INFO SparkContext: Running Spark version 1.4.0
15/10/03 03:13:19 WARN SparkConf: In Spark 1.0 and later spark.local.dir
will be overridden by the value set by the cluster manager (via
SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
15/10/03 03:13:19 WARN SparkConf:
...
15/10/03 03:13:59 INFO DAGScheduler: Job 0 finished: count at
ExampleJob.scala:12, took 18.879104 s
Count: 100
15/10/03 03:13:59 INFO SparkUI: Stopped Spark web UI at []
15/10/03 03:13:59 INFO DAGScheduler: Stopping DAGScheduler
15/10/03 03:13:59 INFO SparkDeploySchedulerBackend: Shutting down all
executors
15/10/03 03:13:59 INFO SparkDeploySchedulerBackend: Asking each executor to
shut down
15/10/03 03:13:59 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
15/10/03 03:13:59 INFO Utils: path =
/data1/spark/tmp/spark-d8c0a18f-6e45-46c8-a208-1e1ad36ae596/blockmgr-d8e40805-3b8c-45f4-97b3-b89874158796,
already present as root for deletion.
15/10/03 03:13:59 INFO MemoryStore: MemoryStore cleared
15/10/03 03:13:59 INFO BlockManager: BlockManager stopped
15/10/03 03:13:59 INFO BlockManagerMaster: BlockManagerMaster stopped
15/10/03 03:13:59 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
15/10/03 03:13:59 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.Exception: Fail!
at com.test.ExampleJob$.main(ExampleJob.scala:14)
at com.test.ExampleJob.main(ExampleJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/10/03 03:13:59 INFO RemoteActorRefProvider$RemotingTerminator: Shutting
down remote daemon.
15/10/03 03:13:59 INFO RemoteActorRefProvider$RemotingTerminator: Remote
daemon shut down; proceeding with flushing remote transports.
15/10/03 03:13:59 INFO Utils: Shutdown hook called
15/10/03 03:13:59 INFO Utils: Deleting directory
/data1/spark/tmp/spark-d8c0a18f-6e45-46c8-a208-1e1ad36ae596
15/10/03 03:14:00 INFO RemoteActorRefProvider$RemotingTerminator: Remoting
shut down.


However, the Spark UI just shows the status as "FINISHED". Is this a
configuration error on my side?
[image: Inline image 1]


Thanks,

Isabelle

Reply via email to