Hello,

sometimes, in the *middle* of a job, the job stops (status is then seen as
FINISHED in the master).

There isn't anything wrong in the shell/submit output.

When looking at the executor logs, I see logs like this:

15/03/04 21:24:51 INFO MapOutputTrackerWorker: Doing the fetch; tracker
actor = Actor[akka.tcp://sparkDriver@ip-10-0-10-17.ec2.internal
:40019/user/MapOutputTracker#893807065]
15/03/04 21:24:51 INFO MapOutputTrackerWorker: Don't have map outputs for
shuffle 38, fetching them
15/03/04 21:24:55 ERROR CoarseGrainedExecutorBackend: Driver Disassociated
[akka.tcp://sparkExecutor@ip-10-0-11-9.ec2.internal:54766] ->
[akka.tcp://sparkDriver@ip-10-0-10-17.ec2.internal:40019] disassociated!
Shutting down.
15/03/04 21:24:55 WARN ReliableDeliverySupervisor: Association with remote
system [akka.tcp://sparkDriver@ip-10-0-10-17.ec2.internal:40019] has
failed, address is now gated for [5000] ms. Reason is: [Disassociated].

How can I investigate further?
Thanks

Reply via email to