Attaching log for when the dev job gets stuck (once all its executors are
lost due to preemption). This is a spark-shell job running in yarn-client

On Wed, Aug 26, 2015 at 10:45 AM, Sadhan Sood <> wrote:

> Hi All,
> We've set up our spark cluster on aws running on yarn (running on hadoop
> 2.3) with fair scheduling and preemption turned on. The cluster is shared
> for prod and dev work where prod runs with a higher fair share and can
> preempt dev jobs if there are not enough resources available for it.
> It appears that dev jobs which get preempted often get unstable after
> losing some executors and the whole jobs gets stuck (without making any
> progress) or end up getting restarted (and hence losing all the work done).
> Has someone encountered this before ? Is the solution just to set 
> spark.task.maxFailures
> to a really high value to recover from task failures in such scenarios? Are
> there other approaches that people have taken for spark multi tenancy that
> works better in such scenario?
> Thanks,
> Sadhan

Attachment: spark_job_stuck.log
Description: Binary data

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to