Alternative to checkpointing and materialization for truncating lineage in high iteration jobs

2014-06-28 Thread Nilesh Chakraborty
Hello, In a thread about java.lang.StackOverflowError when calling count() [1] I saw Tathagata Das share an interesting approach for truncating RDD lineage - this helps prevent StackOverflowErrors in high iteration jobs while avoiding the disk-writing performance penalty. Here's an excerpt from

Re: Alternative to checkpointing and materialization for truncating lineage in high iteration jobs

2014-06-28 Thread Baoxu Shi(Dash)
I’m facing the same situation. It would be great if someone could provide a code snippet as example. On Jun 28, 2014, at 12:36 PM, Nilesh Chakraborty nil...@nileshc.com wrote: Hello, In a thread about java.lang.StackOverflowError when calling count() [1] I saw Tathagata Das share an