One reason Spark on disk is faster than MapReduce is Spark’s advanced Directed Acyclic Graph (DAG) engine. MapReduce will require a complex job to be split into multiple Map-Reduce jobs, with disk I/O at the end of each job and beginning of a new job. With Spark, you may be able to express the same job with fewer number of stages, invoking fewer disk I/O. Disk I/O is an expensive operation, so fewer disk I/O operation translates to better performance.
Mohammed From: Ilya Ganelin [mailto:ilgan...@gmail.com] Sent: Monday, April 27, 2015 7:55 PM To: bit1...@163.com; user Subject: Re: Why Spark is much faster than Hadoop MapReduce even on disk I believe the typical answer is that Spark is actually a bit slower. On Mon, Apr 27, 2015 at 7:34 PM bit1...@163.com<mailto:bit1...@163.com> <bit1...@163.com<mailto:bit1...@163.com>> wrote: Hi, I am frequently asked why spark is also much faster than Hadoop MapReduce on disk (without the use of memory cache). I have no convencing answer for this question, could you guys elaborate on this? Thanks! ________________________________