Re: MAP_INPUT_RECORDS counter in the reducer

Yaron Gonen Fri, 20 Sep 2013 02:13:48 -0700

Hi again,
I've run into this link:
http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201112.mbox/%3ccafe9998.2fef6%25ev...@yahoo-inc.com%3E
Looks like a nice idea. Have someone tried something similar?


Thanks


On Wed, Sep 18, 2013 at 4:46 PM, Shahab Yunus <shahab.yu...@gmail.com>wrote:

> Yes, you are correct that copying phase starts while the maps are running
> and the reduce function is not called until everything is done but aren't
> the Reduce tasks are also already 'initialized' at this point? Which, as
> far as I know and might be wrong, will not have the map input records
> counter (and was my point)?
>
> Regards,
> Shahab
>
>
> On Tue, Sep 17, 2013 at 11:09 PM, Rahul Bhattacharjee <
> rahul.rec....@gmail.com> wrote:
>
>> Shahab,
>>
>> One question - You mentioned - "In the normal configuration, the issue
>> here is that Reducers can start before all the Maps have finished so it is
>> not possible to get the number (or make sense of it even if you are able
>> to,)"
>>
>> I think , reducers would start copying the data form the completed map
>> tasks , but will not start the actual reduce process until data from all
>> the mappers are pulled in.
>>
>> So , the call to the counter Yorn has made might work.If invoked from the
>> reduce method.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>> On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 
>> <java8...@hotmail.com>wrote:
>>
>>> Or you do the calculation in the reducer close() method, even though I
>>> am not sure in the reducer you can get the Mapper's count.
>>>
>>> But even you can't, here is what can do:
>>> 1) Save the JobConf reference in your Mapper conf metehod
>>> 2) Store the Map_INPUT_RECORDS counter in the configuration object as
>>> your own properties, in the close() method of the mapper
>>> 3) Retrieve that property in the reducer close() method, then you have
>>> both numbers at that time.
>>>
>>> Yong
>>>
>>> ------------------------------
>>> Date: Tue, 17 Sep 2013 09:49:06 -0400
>>> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
>>> From: shahab.yu...@gmail.com
>>> To: user@hadoop.apache.org
>>>
>>>
>>> In the normal configuration, the issue here is that Reducers can start
>>> before all the Maps have finished so it is not possible to get the number
>>> (or make sense of it even if you are able to,)
>>>
>>> Having said that, you can specifically make sure that Reducers don't
>>> start until all your maps have completed. It will of course slow down your
>>> job. I don't know whether with this option it will work or not, but you can
>>> try (until experts have some advise already.)
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <yaron.go...@gmail.com>wrote:
>>>
>>> Hi,
>>> Is there a way for the reducer to get the total number of input records
>>> to the map phase?
>>> For example, I want the reducer to normalize a sum by dividing it in the
>>> number of records. I tried getting the value of that counter by using the
>>> line:
>>>
>>> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>>>
>>> in the reducer code, but I got 0.
>>>
>>> Thanks!
>>> Yaron
>>>
>>>
>>>
>>
>

Re: MAP_INPUT_RECORDS counter in the reducer

Reply via email to