Hi everyone,
The possibility to have in memory shuffling is discussed in this issue https://github.com/apache/spark/pull/5403. It was in 2015.
In 2016 the paper "Scaling Spark on HPC Systems" says that Spark still shuffle using disks. I would like to know :
What is the current state of in memory shuffling ? Is it implemented in production ? Does the current shuffle still use disks to work ? Is it possible to somehow do it in RAM only ? Regards, Thomas --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org