Issues running spark on cluster

TJ Klein Wed, 14 Jan 2015 09:56:50 -0800

Hi,

I am running PySpark on a cluster. Generally it runs. However, frequently I
get the warning message (and consequently, the task not being executed):


WARN TaskSchedulerImpl: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have sufficient
memory

It is weird because all my nodes have the same specifications and the same
data. Why does it work sometimes and sometimes not?

Looking at the log file I see stuff like this:

15/01/14 11:42:11 INFO Worker: Disassociated
[akka.tcp://sparkWorker@node001.cluster:50198] ->
[akka.tcp://sparkMaster@node001:7077] Disassociated !
15/01/14 11:42:11 ERROR Worker: Connection to master failed! Waiting for
master to reconnect...
15/01/14 11:42:11 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@node001.cluster:50198] ->
[akka.tcp://sparkExecutor@node001.cluster:35231]: Error [Association failed
with [akka.tcp://sparkExecutor@node001.cluster:35231]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@node001.cluster:35231]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: node001.cluster/172.16.6.101:35231
]
15/01/14 11:42:11 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@node001.cluster:50198] ->
[akka.tcp://sparkMaster@node001:7077]: Error [Association failed with
[akka.tcp://sparkMaster@node001:7077]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@node001:7077]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: node001/172.16.6.101:7077
]
15/01/14 11:42:11 INFO Worker: Disassociated
[akka.tcp://sparkWorker@node001.cluster:50198] ->
[akka.tcp://sparkMaster@node001:7077] Disassociated !
15/01/14 11:42:11 ERROR Worker: Connection to master failed! Waiting for
master to reconnect...
15/01/14 11:42:11 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef:
Message [org.apache.spark.deploy.DeployMessages$ExecutorStateChanged] from
Actor[akka://sparkWorker/user/Worker#-1661660308] to
Actor[akka://sparkWorker/deadLetters] was not delivered. [3] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'


Maybe somebody has an idea? Would greatly appreciate that.

-Tassilo



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Issues-running-spark-on-cluster-tp21138.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Issues running spark on cluster

Reply via email to