Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" 
property.

Regards
Abhi



Sent from my iPhone

On Sep 26, 2012, at 9:23 AM, bharath vissapragada 
<bharathvissapragada1...@gmail.com> wrote:

> 
> I'm no expert in hive, but here are my 2 cents. 
> 
> By default hive schedules a reducer per every 1 GB of data ( change that 
> value by modifying hive.exec.reducers.bytes.per.reducer ) . If your input 
> data is huge, there will be large number of reducers, which might be 
> unnecessary.( Sometimes large number of reducers slows down the job because 
> their number exceeds total task slots and they keep waiting for their turn. 
> Not to forget, the initialization overheads for each task..jvm etc.).
> 
> Overall, I think there cannot be any optimum values for a cluster. It depends 
> on the type of queries, size of your inputs, size of map outputs in the jobs 
> (intermediate outputs ). So you can can check various values and see which 
> one is the best. From my experience setting "hive.exec.reducers.max" to total 
> number of reduce slots in your cluster gives you a decent performance since 
> all the reducers are completed in a single wave. (This may or maynot work for 
> you, worth giving a try).
> 
> 
> On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <abhishek.dod...@gmail.com> wrote:
>> 
>> Hi all,
>> 
>> I have doubt regarding below properties, is it a good practice to override 
>> below properties in hive.
>> 
>> If yes, what is the optimal values for the following properties?
>> 
>>   set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>   set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>   set mapred.reduce.tasks=<number>
>> 
>> Regards
>> Abhi
>> 
>> Sent from my iPhone
> 
> 
> 
> -- 
> Regards,
> Bharath .V
> w:http://researchweb.iiit.ac.in/~bharath.v

Reply via email to