Hey,

I am facing a weird issue. 

My spark workers keep dying every now and then and in the master logs i keep
on seeing following messages,

 14/05/14 10:09:24 WARN Master: Removing worker-20140514080546-x.x.x.x-50737
because we got no heartbeat in 60 seconds
14/05/14 14:18:41 WARN Master: Removing worker-20140514123848-x.x.x.x-50901
because we got no heartbeat in 60 seconds

In my cluster, I have one master node and four worker nodes. 

On the cluster i am trying to run shark and related queries. 

I tried setting the property, spark.worker.timeout=300 on all workers and
master but still it shows, 60 seconds timeout. 


After that, i keep seeing the following messages as well,

14/05/14 16:59:52 INFO Master: Removing app app-20140514164003-0009

On the worker nodes, in the work folder, i cant seem to find any suspicious
messages. 

Any help as to what is causing all this. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-workers-keep-getting-disconnected-Keep-dying-from-the-cluster-tp5740.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to