Hello Jamal, For efficient processing all the values associated with the same key get sorted and go to same reducer. As a result the reducer gets a key and a list of values as its input. To me your assumption seems correct.
Regards, Mohammad Tariq On Thu, Nov 22, 2012 at 1:20 AM, jamal sasha <jamalsha...@gmail.com> wrote: > Hi.. > I guess i am asking alot of fundamental questions but i thank you guys for > taking out time to explain my doubts. > So i am able to write map reduce jobs but here is my mydoubt > As of now i am writing mappers which emit key and a value > This key value is then captured at reducer end and then i process the key > and value there. > Let's say i want to calculate the average... > Key1 value1 > Key2 value 2 > Key 1 value 3 > > So the output is something like > Key1 average of value 1 and value 3 > Key2 average 2 = value 2 > > Right now in reducer i have to create a dictionary with key as original > keys and value is a list. > Data = defaultdict(list) == // python usrr > But i thought that > Mapper takes in the key value pairs and outputs key: ( v1,v2....)and > Reducer takes in this key and list of values and returns > Key , new value.. > > So why is the input of reducer the simple output of mapper and not the > list of all the values to a particular key or did i understood something. > Am i making any sense ??