Hi, Shuffle output goes to local disk each time, as far as I know, never to memory.
On Fri, Oct 2, 2015 at 1:26 PM Adrian Tanase <atan...@adobe.com> wrote: > I’m not sure this is related to memory management – the shuffle is the > central act of moving data around nodes when the computations need the data > on another node (E.g. Group by, sort, etc) > > Shuffle read and shuffle write should be mirrored on the left/right side > of a shuffle between 2 stages. > > -adrian > > From: Kartik Mathur > Date: Thursday, October 1, 2015 at 10:36 PM > To: user > Subject: Shuffle Write v/s Shuffle Read > > Hi > > I am trying to better understand shuffle in spark . > > Based on my understanding thus far , > > *Shuffle Write* : writes stage output for intermediate stage on local > disk if memory is not sufficient., > Example , if each worker has 200 MB memory for intermediate results and > the results are 300MB then , each executer* will keep 200 MB in memory > and will write remaining 100 MB on local disk . * > > *Shuffle Read : *Each executer will read from other executer's *memory + > disk , so total read in above case will be 300(200 from memory and 100 from > disk)*num of executers ? * > > Is my understanding correct ? > > Thanks, > Kartik >