Attaching log for when the dev job gets stuck (once all its executors are
lost due to preemption). This is a spark-shell job running in yarn-client
mode.

On Wed, Aug 26, 2015 at 10:45 AM, Sadhan Sood <sadhan.s...@gmail.com> wrote:

> Hi All,
>
> We've set up our spark cluster on aws running on yarn (running on hadoop
> 2.3) with fair scheduling and preemption turned on. The cluster is shared
> for prod and dev work where prod runs with a higher fair share and can
> preempt dev jobs if there are not enough resources available for it.
> It appears that dev jobs which get preempted often get unstable after
> losing some executors and the whole jobs gets stuck (without making any
> progress) or end up getting restarted (and hence losing all the work done).
> Has someone encountered this before ? Is the solution just to set 
> spark.task.maxFailures
> to a really high value to recover from task failures in such scenarios? Are
> there other approaches that people have taken for spark multi tenancy that
> works better in such scenario?
>
> Thanks,
> Sadhan
>

Attachment: spark_job_stuck.log
Description: Binary data

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to