I am running spark with a cloudera cluster, spark version
0.9.0-cdh5.0.0-beta-2

While nothing else is running on the cluster i am having frequent worker
failures with errors like

AssociationError [akka.tcp://sparkWorker@worker5:7078] ->
[akka.tcp://sparkExecutor@worker5:37487]: Error [Association failed with
[akka.tcp://sparkExecutor@worker5:37487]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@worker5:37487]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: worker5/172.21.10.128:37487
]

These errors are not occurring while a spark job is running, in fact jobs
are able to run to completion without errors, some time afterwards the
workers die.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/workers-die-with-AssociationError-tp2891.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to