Re: Task output before a shuffle

2013-10-29 Thread Ufuk Celebi
On 29 Oct 2013, at 02:47, Matei Zaharia matei.zaha...@gmail.com wrote: Yes, we still write out data after these tasks in Spark 0.8, and it needs to be written out before any stage that reads it can start. The main reason is simplicity when there are faults, as well as more flexible scheduling

Task output before a shuffle

2013-10-28 Thread Ufuk Celebi
Hey everybody, I just watched the Spark Internals presentation [1] from the December 2012 dev meetup and have a couple of questions regarding the output of tasks before a shuffle. 1. Can anybody confirm that the default is still to persist stage output to RAM/disk and then have the following

Re: Task output before a shuffle

2013-10-28 Thread Matei Zaharia
Hi Ufuk, Yes, we still write out data after these tasks in Spark 0.8, and it needs to be written out before any stage that reads it can start. The main reason is simplicity when there are faults, as well as more flexible scheduling (you don't have to decide where each reduce task is in