On 29 Oct 2013, at 02:47, Matei Zaharia matei.zaha...@gmail.com wrote:
Yes, we still write out data after these tasks in Spark 0.8, and it needs to
be written out before any stage that reads it can start. The main reason is
simplicity when there are faults, as well as more flexible scheduling
Hey everybody,
I just watched the Spark Internals presentation [1] from the December 2012 dev
meetup and have a couple of questions regarding the output of tasks before a
shuffle.
1. Can anybody confirm that the default is still to persist stage output to
RAM/disk and then have the following
Hi Ufuk,
Yes, we still write out data after these tasks in Spark 0.8, and it needs to be
written out before any stage that reads it can start. The main reason is
simplicity when there are faults, as well as more flexible scheduling (you
don't have to decide where each reduce task is in