Stage failure in BlockManager due to FileNotFoundException on long-running streaming job

2014-08-20 Thread Silvio Fiorito
This is a long running Spark Streaming job running in YARN, Spark v1.0.2 on CDH5. The jobs will run for about 34-37 hours then die due to this FileNotFoundException. There’s very little CPU or RAM usage, I’m running 2 x cores, 2 x executors, 4g memory, YARN cluster mode. Here’s the stack

Re: Stage failure in BlockManager due to FileNotFoundException on long-running streaming job

2014-08-20 Thread Aaron Davidson
This is likely due to a bug in shuffle file consolidation (which you have enabled) which was hopefully fixed in 1.1 with this patch: https://github.com/apache/spark/commit/78f2af582286b81e6dc9fa9d455ed2b369d933bd Until 1.0.3 or 1.1 are released, the simplest solution is to disable

Re: Stage failure in BlockManager due to FileNotFoundException on long-running streaming job

2014-08-20 Thread Silvio Fiorito
Thanks, I’ll go ahead and disable that setting for now. From: Aaron Davidson ilike...@gmail.commailto:ilike...@gmail.com Date: Wednesday, August 20, 2014 at 3:20 PM To: Silvio Fiorito silvio.fior...@granturing.commailto:silvio.fior...@granturing.com Cc: