I am running a spark streaming application on a cluster composed by three nodes, each one with a worker and three executors (so a total of 9 executors). I am using the spark standalone mode.
The application is run with a spark-submit command with option --deploy-mode client. The submit command is run from one of the nodes, let's call it node 1. As a fault tolerance test I am stopping the worker on node 2 with the command sudo service spark-worker stop. In logs i can see that the Master keeps trying to run executors on the shutting down worker (I can see thousands of tries, all with status FAILED, for few seconds), and then the whole application is terminated by spark. I tried to get more information about how spark handle worker failures but I was not able to find any useful answer. In spark source code I can see that the worker call for a driver kill when we stop the worker: method onStop here https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala This might explain why the whole application is stopped eventually. Is this the expected behavior in case of a worker explicitly stopped? Is this a case of worker failure or it has to be considered differently (I am explicitly shutting down the node here)? Would it be the same behavior if the worker process was killed (and not explicitly stopped)? Thank you Davide -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Application-is-stopped-after-stopping-a-worker-tp29111.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org