RE: mapPartitions versus map overhead?

Shao, Saisai Mon, 23 Dec 2013 22:11:02 -0800

Hi Huan Dao,

Actually it is the same for map and mapPartitions if you do transformations 
like this:
a.map(r => r * 2)
a.mapPartitions(iter => iter.map(r => r *2))


these are iterator to iterator transformations. 

But mapPartitions are more flexible than map, you can do transformation like: 
Iterator[A] => Iterator[B], where Iterator[B] can be anything iterable, there's 
no one to one mapping constraint. In short words, mapPartitions is quite like 
superset of map. You can check MappedRDD and MapPartitionsRDD to see the 
details.

Thanks
Jerry

-----Original Message-----
From: Huan Dao [mailto:huan...@me.com] 
Sent: Tuesday, December 24, 2013 1:15 PM
To: user@spark.incubator.apache.org
Subject: mapPartitions versus map overhead?

Hi all, is there any overhead of mapPartitions versus overhead, if I implement 
an algorithm using map -> reduce versus mapPartitions -> reduce.
Thanks, 
Huan Dao

RE: mapPartitions versus map overhead?

Reply via email to