Re: Task output before a shuffle

2013-10-29 Thread Ufuk Celebi
On 29 Oct 2013, at 02:47, Matei Zaharia matei.zaha...@gmail.com wrote: Yes, we still write out data after these tasks in Spark 0.8, and it needs to be written out before any stage that reads it can start. The main reason is simplicity when there are faults, as well as more flexible scheduling

Task output before a shuffle

2013-10-28 Thread Ufuk Celebi
Hey everybody, I just watched the Spark Internals presentation [1] from the December 2012 dev meetup and have a couple of questions regarding the output of tasks before a shuffle. 1. Can anybody confirm that the default is still to persist stage output to RAM/disk and then have the following