Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" property.
Regards Abhi Sent from my iPhone On Sep 26, 2012, at 9:23 AM, bharath vissapragada <bharathvissapragada1...@gmail.com> wrote: > > I'm no expert in hive, but here are my 2 cents. > > By default hive schedules a reducer per every 1 GB of data ( change that > value by modifying hive.exec.reducers.bytes.per.reducer ) . If your input > data is huge, there will be large number of reducers, which might be > unnecessary.( Sometimes large number of reducers slows down the job because > their number exceeds total task slots and they keep waiting for their turn. > Not to forget, the initialization overheads for each task..jvm etc.). > > Overall, I think there cannot be any optimum values for a cluster. It depends > on the type of queries, size of your inputs, size of map outputs in the jobs > (intermediate outputs ). So you can can check various values and see which > one is the best. From my experience setting "hive.exec.reducers.max" to total > number of reduce slots in your cluster gives you a decent performance since > all the reducers are completed in a single wave. (This may or maynot work for > you, worth giving a try). > > > On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <abhishek.dod...@gmail.com> wrote: >> >> Hi all, >> >> I have doubt regarding below properties, is it a good practice to override >> below properties in hive. >> >> If yes, what is the optimal values for the following properties? >> >> set hive.exec.reducers.bytes.per.reducer=<number> >> In order to limit the maximum number of reducers: >> set hive.exec.reducers.max=<number> >> In order to set a constant number of reducers: >> set mapred.reduce.tasks=<number> >> >> Regards >> Abhi >> >> Sent from my iPhone > > > > -- > Regards, > Bharath .V > w:http://researchweb.iiit.ac.in/~bharath.v