Thank you for the info Bejoy.

Cheers!
Manoj.



On Thu, Nov 22, 2012 at 12:04 AM, Bejoy KS <bejoy.had...@gmail.com> wrote:

> **
> Hi Manoj
>
> If you intend to calculate the number of reducers based on the input size,
> then in your driver class you should get the size of the input dir in hdfs
> and say you intended to give n bytes to a reducer then the number of
> reducers can be computed as
> Total input size/ bytes per reducer.
>
> You can round this value and use it to set the number of reducers in conf
> programatically.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * Manoj Babu <manoj...@gmail.com>
> *Date: *Wed, 21 Nov 2012 23:28:00 +0530
> *To: *<user@hadoop.apache.org>
> *Cc: *bejoy.had...@gmail.com<bejoy.had...@gmail.com>
> *Subject: *Re: guessing number of reducers.
>
> Hi,
>
> How to set no of reducers in job conf dynamically?
> For example some days i am getting 500GB of data on heavy traffic and some
> days 100GB only.
>
> Thanks in advance!
>
> Cheers!
> Manoj.
>
>
>
> On Wed, Nov 21, 2012 at 11:19 PM, Kartashov, Andy 
> <andy.kartas...@mpac.ca>wrote:
>
>>  Bejoy,
>>
>>
>>
>> I’ve read somethere about keeping number of mapred.reduce.tasks below the
>> reduce task capcity. Here is what I just tested:
>>
>>
>>
>> Output 25Gb. 8DN cluster with 16 Map and Reduce Task Capacity:
>>
>>
>>
>> 1 Reducer   – 22mins
>>
>> 4 Reducers – 11.5mins
>>
>> 8 Reducers – 5mins
>>
>> 10 Reducers – 7mins
>>
>> 12 Reducers – 6:5mins
>>
>> 16 Reducers – 5.5mins
>>
>>
>>
>> 8 Reducers have won the race. But Reducers at the max capacity was very
>> clos. J
>>
>>
>>
>> AK47
>>
>>
>>
>>
>>
>> *From:* Bejoy KS [mailto:bejoy.had...@gmail.com]
>> *Sent:* Wednesday, November 21, 2012 11:51 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: guessing number of reducers.
>>
>>
>>
>> Hi Sasha
>>
>> In general the number of reduce tasks is chosen mainly based on the data
>> volume to reduce phase. In tools like hive and pig by default for every 1GB
>> of map output there will be a reducer. So if you have 100 gigs of map
>> output then 100 reducers.
>> If your tasks are more CPU intensive then you need lesser volume of data
>> per reducer for better performance results.
>>
>> In general it is better to have the number of reduce tasks slightly less
>> than the number of available reduce slots in the cluster.
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>>  ------------------------------
>>
>> *From: *jamal sasha <jamalsha...@gmail.com>
>>
>> *Date: *Wed, 21 Nov 2012 11:38:38 -0500
>>
>> *To: *user@hadoop.apache.org<user@hadoop.apache.org>
>>
>> *ReplyTo: *user@hadoop.apache.org
>>
>> *Subject: *guessing number of reducers.
>>
>>
>>
>> By default the number of reducers is set to 1..
>> Is there a good way to guess optimal number of reducers....
>> Or let's say i have tbs worth of data... mappers are of order 5000 or
>> so...
>> But ultimately i am calculating , let's say, some average of whole
>> data... say average transaction occurring...
>> Now the output will be just one line in one "part"... rest of them will
>> be empty.So i am guessing i need loads of reducers but then most of them
>> will be empty but at the same time one reducer won't suffice..
>> What's the best way to solve this..
>> How to guess optimal number of reducers..
>> Thanks
>>  NOTICE: This e-mail message and any attachments are confidential,
>> subject to copyright and may be privileged. Any unauthorized use, copying
>> or disclosure is prohibited. If you are not the intended recipient, please
>> delete and contact the sender immediately. Please consider the environment
>> before printing this e-mail. AVIS : le présent courriel et toute pièce
>> jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur
>> et peuvent être couverts par le secret professionnel. Toute utilisation,
>> copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le
>> destinataire prévu de ce courriel, supprimez-le et contactez immédiatement
>> l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
>> courriel
>>
>
>

Reply via email to