One example is that you'd like to set up jdbc connection for each partition
and share this connection across the records.

mapPartitions is much more like the paradigm of mapper in mapreduce. In the
mapper of mapreduce, you have setup method  to do any initialization stuff
before processing the split and read and process records one by one  in the
map method.

On Wed, Jun 24, 2015 at 8:03 AM, Holden Karau <hol...@pigscanfly.ca> wrote:

> I think one of the primary cases where mapPartitions is useful if you are
> going to be doing any setup work that can be re-used between processing
> each element, this way the setup work only needs to be done once per
> partition (for example creating an instance of jodatime).
>
> Both map and mapPartitions are implemented using the MapPartitionsRDD.
>
> In general if your logic is easily expressed with map, and there isn't any
> setup work you are doing that could be shared, using map instead of map
> partitions tends to result in more readable code which is valuable in and
> off its self.
>
> On Tue, Jun 23, 2015 at 4:57 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
> wrote:
>
>> I know when to use a map () but when should i use mapPartitions() ?
>>
>> Which is faster ?
>>
>> --
>> Deepak
>>
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
> Linked In: https://www.linkedin.com/in/holdenkarau
>

Reply via email to