There are kind of two parts to this.  The semantics of MapReduce promise that 
all tuples sharing the same key value are sent to the same reducer, so that you 
can write useful MR applications that do things like “count words” or 
“summarize by date”.  In order to accomplish that, the shuffle phase of MR 
performs a partitioning by key to move tuples sharing the same key to the same 
node where they can be processed together.  You can think of key-partitioning 
as a strategy that assists in parallel distributed sorting.
john

From: Sai Sai [mailto:saigr...@yahoo.in]
Sent: Friday, June 07, 2013 5:17 AM
To: user@hadoop.apache.org
Subject: Re: Why/When partitioner is used.

I always get confused why we should partition and what is the use of it.
Why would one want to send all the keys starting with A to Reducer1 and B to R2 
and so on...
Is it just to parallelize the reduce process.
Please help.
Thanks
Sai

Reply via email to