Hi Ethan, How are you specifying the master to spark?
Able to recover from master failover is already handled by the underlying Mesos scheduler, but you have to use zookeeper instead of directly passing in the master uris. Tim On Mon, Jan 12, 2015 at 12:44 PM, Ethan Wolf <ethan.w...@alum.mit.edu> wrote: > We are running Spark and Spark Streaming on Mesos (with multiple masters > for > HA). > At launch, our Spark jobs successfully look up the current Mesos master > from > zookeeper and spawn tasks. > > However, when the Mesos master changes while the spark job is executing, > the > spark driver seems to interact with the old Mesos master, and therefore > fails to launch any new tasks. > We are running long running Spark streaming jobs, so we have temporarily > switched to coarse grained as a work around, but it prevents us from > running > in fine grained mode which we would prefer for some job. > > Looking at the code for MesosSchedulerBackend, it it has an empty > implementation of the reregistered (and disconnected) methods, which I > believe would be called when the master changes: > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L202 > > > http://mesos.apache.org/documentation/latest/app-framework-development-guide/ > > Are there any plans to implement master reregistration in the Spark > framework, or does anyone have any suggested workarounds for long running > jobs to deal with the mesos master changing? (Or is there something we are > doing wrong?) > > Thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Framework-handling-of-Mesos-master-change-tp21107.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >