On 29 Oct 2013, at 02:47, Matei Zaharia matei.zaha...@gmail.com wrote:
Yes, we still write out data after these tasks in Spark 0.8, and it needs to
be written out before any stage that reads it can start. The main reason is
simplicity when there are faults, as well as more flexible scheduling
Hey everybody,
I just watched the Spark Internals presentation [1] from the December 2012 dev
meetup and have a couple of questions regarding the output of tasks before a
shuffle.
1. Can anybody confirm that the default is still to persist stage output to
RAM/disk and then have the following