1) Spark only needs to shuffle when data needs to be partitioned around the workers in an all-to-all fashion. 2) Multi-stage jobs that would normally require several map reduce jobs, thus causing data to be dumped to disk between the jobs can be cached in memory.
- Newbie question: what makes Spark run faster than MapReduce Muler
- Re: Newbie question: what makes Spark run faster than Map... Hien Luu
- Re: Newbie question: what makes Spark run faster than Map... Corey Nolet