Hi, I have a spark job running on yarn-client mode. At some point during Join stage, executor(container) runs out of memory and yarn kills it. Due to this Entire job restarts! and it keeps doing it on every failure?
What is the best way to checkpoint? I see there's checkpoint api and other option might be to persist before Join stage. Would that prevent retry of entire job? How about just retrying only the task that was distributed to that faulty executor? Thanks -- [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] <https://twitter.com/Xactly> [image: Facebook] <https://www.facebook.com/XactlyCorp> [image: YouTube] <http://www.youtube.com/xactlycorporation>