If you’re using multiple masters with ZooKeeper then you should set your master 
URL to be

spark://host01:7077,host02:7077

And the property spark.deploy.recoveryMode=ZOOKEEPER

See here for more info: 
http://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper

From: James King
Date: Friday, May 8, 2015 at 11:22 AM
To: user
Subject: Submit Spark application in cluster mode and supervised

I have two hosts host01 and host02 (lets call them)

I run one Master and two Workers on host01
I also run one Master and two Workers on host02

Now I have 1 LIVE Master on host01 and a STANDBY Master on host02
The LIVE Master is aware of all Workers in the cluster

Now I submit a Spark application using

bin/spark-submit --class SomeApp --deploy-mode cluster --supervise --master 
spark://host01:7077 Some.jar

This to make the driver resilient to failure.

Now the interesting part:

If I stop the cluster (all daemons on all hosts) and restart the Master and 
Workers only on host01 the job resumes! as expected.

But if I stop the cluster (all daemons on all hosts) and restart the Master and 
Workers only on host02 the job does not resume execution! why?

I can see the driver on host02 WebUI listed but no job execution. Please let me 
know why.

Am I wrong to expect it to resume execution in this case?





Reply via email to