Hi,
I believe this is some kind of timeout problem but can't figure out how to 
increase it.
I am running spark 1.2.0 on yarn (all from cdh 5.3.0). I submit a python task 
which first loads big RDD from hbase - I can see in the screen output all 
executors fire up then no more logging output for next two minutes after which 
I get plenty of
15/01/16 17:35:16 ERROR cluster.YarnClientClusterScheduler: Lost executor 7 on 
node01: remote Akka client disassociated15/01/16 17:35:16 INFO 
scheduler.TaskSetManager: Re-queueing tasks for 7 from TaskSet 1.015/01/16 
17:35:16 WARN scheduler.TaskSetManager: Lost task 32.0 in stage 1.0 (TID 17, 
node01): ExecutorLostFailure (executor 7 lost)15/01/16 17:35:16 WARN 
scheduler.TaskSetManager: Lost task 34.0 in stage 1.0 (TID 25, node01): 
ExecutorLostFailure (executor 7 lost)
this points to some timeout ~120secs while the nodes are loading the big RDD? 
any ideas how to get around it?
fyi I already use following options without any success:
    spark.core.connection.ack.wait.timeout: 600    spark.akka.timeout: 1000

thanks,Antony.

Reply via email to