Hey, I am facing a weird issue.
My spark workers keep dying every now and then and in the master logs i keep on seeing following messages, 14/05/14 10:09:24 WARN Master: Removing worker-20140514080546-x.x.x.x-50737 because we got no heartbeat in 60 seconds 14/05/14 14:18:41 WARN Master: Removing worker-20140514123848-x.x.x.x-50901 because we got no heartbeat in 60 seconds In my cluster, I have one master node and four worker nodes. On the cluster i am trying to run shark and related queries. I tried setting the property, spark.worker.timeout=300 on all workers and master but still it shows, 60 seconds timeout. After that, i keep seeing the following messages as well, 14/05/14 16:59:52 INFO Master: Removing app app-20140514164003-0009 On the worker nodes, in the work folder, i cant seem to find any suspicious messages. Any help as to what is causing all this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-workers-keep-getting-disconnected-Keep-dying-from-the-cluster-tp5740.html Sent from the Apache Spark User List mailing list archive at Nabble.com.