Suraj Sharma created SPARK-32007: ------------------------------------ Summary: Spark Driver Supervise does not work reliably Key: SPARK-32007 URL: https://issues.apache.org/jira/browse/SPARK-32007 Project: Spark Issue Type: Question Components: Spark Core Affects Versions: 2.4.4 Environment: ||Name||Value|| |Java Version|1.8.0_121 (Oracle Corporation)| |Java Home|/usr/java/jdk1.8.0_121/jre| |Scala Version|version 2.11.12| |OS|Amazon Linux| h4. Reporter: Suraj Sharma
I have a standalone cluster setup. I DO NOT have a streaming use case. I use AWS EC2 machines to have spark master and worker processes. *Problem*: If a spark worker machine running some drivers and executor dies, then the driver is not spawned again on other healthy machines. *Below are my findings:* ||Action/Behaviour||Executor||Driver|| |Worker Machine Stop|Relaunches on an active machine|NO Relaunch| |kill -9 to process|Relaunches on other machines|Relaunches on other machines| |kill to process|Relaunches on other machines|Relaunches on other machines| *Cluster Setup:* # I have a spark standalone cluster # {{spark.driver.supervise=true}} # Spark Master HA is enabled and is backed by zookeeper # Spark version = 2.4.4 # I am using a systemd script for the spark worker process -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org