Don't stages by definition include a shuffle? If you didn't need a shuffle between 2 stages you could merge them into one stage.
thanks, krexos ------- Original Message ------- On Saturday, July 2nd, 2022 at 4:13 PM, Sean Owen <sro...@gmail.com> wrote: > Because only shuffle stages write shuffle files. Most stages are not shuffles > > On Sat, Jul 2, 2022, 7:28 AM krexos <kre...@protonmail.com.invalid> wrote: > >> Hello, >> >> One of the main "selling points" of Spark is that unlike Hadoop map-reduce >> that persists intermediate results of its computation to HDFS (disk), Spark >> keeps all its results in memory. I don't understand this as in reality when >> a Spark stage finishes[it writes all of the data into shuffle files stored >> on the >> disk](https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/4-shuffleDetails.md). >> How then is this an improvement on map-reduce? >> >> Image from https://youtu.be/7ooZ4S7Ay6Y >> >> thanks!