I am running spark with a cloudera cluster, spark version 0.9.0-cdh5.0.0-beta-2
While nothing else is running on the cluster i am having frequent worker failures with errors like AssociationError [akka.tcp://sparkWorker@worker5:7078] -> [akka.tcp://sparkExecutor@worker5:37487]: Error [Association failed with [akka.tcp://sparkExecutor@worker5:37487]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@worker5:37487] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: worker5/172.21.10.128:37487 ] These errors are not occurring while a spark job is running, in fact jobs are able to run to completion without errors, some time afterwards the workers die. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/workers-die-with-AssociationError-tp2891.html Sent from the Apache Spark User List mailing list archive at Nabble.com.