Hi Huan Dao, Actually it is the same for map and mapPartitions if you do transformations like this: a.map(r => r * 2) a.mapPartitions(iter => iter.map(r => r *2))
these are iterator to iterator transformations. But mapPartitions are more flexible than map, you can do transformation like: Iterator[A] => Iterator[B], where Iterator[B] can be anything iterable, there's no one to one mapping constraint. In short words, mapPartitions is quite like superset of map. You can check MappedRDD and MapPartitionsRDD to see the details. Thanks Jerry -----Original Message----- From: Huan Dao [mailto:huan...@me.com] Sent: Tuesday, December 24, 2013 1:15 PM To: user@spark.incubator.apache.org Subject: mapPartitions versus map overhead? Hi all, is there any overhead of mapPartitions versus overhead, if I implement an algorithm using map -> reduce versus mapPartitions -> reduce. Thanks, Huan Dao