Hi,

We have been running Spark 1.0.2 with Mesos 0.20.1 in fine grained mode and for 
the most part it has been working well.

We have been using mesos://zk://server1:2181,server2:2181,server3:2181/mesos as 
the spark master URL and this works great to get the Mesos leader.

Unfortunately, this leader can change while our Spark process is running and 
Spark seems to still want to use the old leader instead of querying Mesos for 
the current leader periodically. This results in Spark not getting resources 
from Mesos to launch executors and the driver just hanging.

I checked the code (link below) and it seems like it simply gets the leader 
initially and doesn’t do that periodic check to get the most current leader (it 
is possible I misread the code).

https://github.com/apache/spark/blob/a878660d2d7bb7ad9b5818a674e1e7c651077e78/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala

Anyone else encounter this issue or know how to resolve it?

Thanks,
Mahesh

________________________________
This E-mail and any of its attachments may contain Time Warner Cable 
proprietary information, which is privileged, confidential, or subject to 
copyright belonging to Time Warner Cable. This E-mail is intended solely for 
the use of the individual or entity to which it is addressed. If you are not 
the intended recipient of this E-mail, you are hereby notified that any 
dissemination, distribution, copying, or action taken in relation to the 
contents of and attachments to this E-mail is strictly prohibited and may be 
unlawful. If you have received this E-mail in error, please notify the sender 
immediately and permanently delete the original and any copy of this E-mail and 
any printout.

Reply via email to