Hi, I am running PySpark on a cluster. Generally it runs. However, frequently I get the warning message (and consequently, the task not being executed):
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory It is weird because all my nodes have the same specifications and the same data. Why does it work sometimes and sometimes not? Looking at the log file I see stuff like this: 15/01/14 11:42:11 INFO Worker: Disassociated [akka.tcp://sparkWorker@node001.cluster:50198] -> [akka.tcp://sparkMaster@node001:7077] Disassociated ! 15/01/14 11:42:11 ERROR Worker: Connection to master failed! Waiting for master to reconnect... 15/01/14 11:42:11 ERROR EndpointWriter: AssociationError [akka.tcp://sparkWorker@node001.cluster:50198] -> [akka.tcp://sparkExecutor@node001.cluster:35231]: Error [Association failed with [akka.tcp://sparkExecutor@node001.cluster:35231]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@node001.cluster:35231] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: node001.cluster/172.16.6.101:35231 ] 15/01/14 11:42:11 ERROR EndpointWriter: AssociationError [akka.tcp://sparkWorker@node001.cluster:50198] -> [akka.tcp://sparkMaster@node001:7077]: Error [Association failed with [akka.tcp://sparkMaster@node001:7077]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@node001:7077] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: node001/172.16.6.101:7077 ] 15/01/14 11:42:11 INFO Worker: Disassociated [akka.tcp://sparkWorker@node001.cluster:50198] -> [akka.tcp://sparkMaster@node001:7077] Disassociated ! 15/01/14 11:42:11 ERROR Worker: Connection to master failed! Waiting for master to reconnect... 15/01/14 11:42:11 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$ExecutorStateChanged] from Actor[akka://sparkWorker/user/Worker#-1661660308] to Actor[akka://sparkWorker/deadLetters] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown' Maybe somebody has an idea? Would greatly appreciate that. -Tassilo -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Issues-running-spark-on-cluster-tp21138.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org