Hi,

I have a spark job running on yarn-client mode. At some point during Join
stage, executor(container) runs out of memory and yarn kills it. Due to
this Entire job restarts! and it keeps doing it on every failure?

What is the best way to checkpoint? I see there's checkpoint api and other
option might be to persist before Join stage. Would that prevent retry of
entire job? How about just retrying only the task that was distributed to
that faulty executor?

Thanks

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Reply via email to