Attaching log for when the dev job gets stuck (once all its executors are lost due to preemption). This is a spark-shell job running in yarn-client mode.
On Wed, Aug 26, 2015 at 10:45 AM, Sadhan Sood <sadhan.s...@gmail.com> wrote: > Hi All, > > We've set up our spark cluster on aws running on yarn (running on hadoop > 2.3) with fair scheduling and preemption turned on. The cluster is shared > for prod and dev work where prod runs with a higher fair share and can > preempt dev jobs if there are not enough resources available for it. > It appears that dev jobs which get preempted often get unstable after > losing some executors and the whole jobs gets stuck (without making any > progress) or end up getting restarted (and hence losing all the work done). > Has someone encountered this before ? Is the solution just to set > spark.task.maxFailures > to a really high value to recover from task failures in such scenarios? Are > there other approaches that people have taken for spark multi tenancy that > works better in such scenario? > > Thanks, > Sadhan >
spark_job_stuck.log
Description: Binary data
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org