Hello, I had a question about error handling in Spark job: if an exception occurs during the job, what is the best way to get notification of the failure? Can Spark jobs return with different exit codes?
For example, I wrote a dummy Spark job just throwing out an Exception, as follows: import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object ExampleJob { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("Test Job") val sc = new SparkContext(conf) try { val count = sc.parallelize(1 to 100).count println(s"Count: $count") *throw new Exception("Fail!")* } finally { sc.stop } } } The spark-submit execution trace shows the error: spark-submit --class com.test.ExampleJob test.jar 15/10/03 03:13:16 INFO SparkContext: Running Spark version 1.4.0 15/10/03 03:13:19 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN). 15/10/03 03:13:19 WARN SparkConf: ... 15/10/03 03:13:59 INFO DAGScheduler: Job 0 finished: count at ExampleJob.scala:12, took 18.879104 s Count: 100 15/10/03 03:13:59 INFO SparkUI: Stopped Spark web UI at [] 15/10/03 03:13:59 INFO DAGScheduler: Stopping DAGScheduler 15/10/03 03:13:59 INFO SparkDeploySchedulerBackend: Shutting down all executors 15/10/03 03:13:59 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 15/10/03 03:13:59 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 15/10/03 03:13:59 INFO Utils: path = /data1/spark/tmp/spark-d8c0a18f-6e45-46c8-a208-1e1ad36ae596/blockmgr-d8e40805-3b8c-45f4-97b3-b89874158796, already present as root for deletion. 15/10/03 03:13:59 INFO MemoryStore: MemoryStore cleared 15/10/03 03:13:59 INFO BlockManager: BlockManager stopped 15/10/03 03:13:59 INFO BlockManagerMaster: BlockManagerMaster stopped 15/10/03 03:13:59 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 15/10/03 03:13:59 INFO SparkContext: Successfully stopped SparkContext Exception in thread "main" java.lang.Exception: Fail! at com.test.ExampleJob$.main(ExampleJob.scala:14) at com.test.ExampleJob.main(ExampleJob.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/10/03 03:13:59 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 15/10/03 03:13:59 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 15/10/03 03:13:59 INFO Utils: Shutdown hook called 15/10/03 03:13:59 INFO Utils: Deleting directory /data1/spark/tmp/spark-d8c0a18f-6e45-46c8-a208-1e1ad36ae596 15/10/03 03:14:00 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down. However, the Spark UI just shows the status as "FINISHED". Is this a configuration error on my side? [image: Inline image 1] Thanks, Isabelle