Hi Tatarinov,

Thanks for the reply, by my understanding did you mean to set number to reduce 
tasks equal to number of reduce slots in the cluster?

Regards
Abhi


Sent from my iPhone

On Jul 24, 2012, at 12:51 AM, Igor Tatarinov <i...@decide.com> wrote:

> Here is my 2 cents.
> The parameters you are looking at are quite specific. Unless you know what 
> you are doing it might be hard to set them exactly right and they shouldn't 
> make that much of a difference - again unless you know the specifics.
> 
> What worked for me is using a single "wave" of reducers. Basically, you want 
> to set the number of reduce tasks to be equal to the number of reduce slots 
> (assuming your job will run by itself).
> 
> It might also help to re-arrange your joins so that the larger table is 
> streamed (https://cwiki.apache.org/Hive/languagemanual-joins.html).
> That seems especially important with map joins since those fail if there is 
> not enough memory and have to be rerun as regular joins.
> 
> Hope this helps.
> 
> On Mon, Jul 23, 2012 at 6:54 PM, abhiTowson cal <abhishek.dod...@gmail.com> 
> wrote:
> Hi all,
> 
> Some queries in hive are executing for too long.So i have overriden
> some parameters in hive, for some querys performance increased rapidly
> when i overriden this properities  for some querys no change in
> performance.Can any one you
> tell me any other optimizations in hive apart from partitions and
> buckets,
> 
> set io.sort.mb=512;
> set io.sort.factor=100;
> set mapred.reduce.parallel.copies=40;
> set hive.map.aggr =true;
> set hive.exec.parallel=true;
> set hive.groupby.skewindata=true;
> set mapred.job.reuse.jvm.num.tasks=-1;
> 
> default values were
> 
> io.sort.mb=256;
> io.sort.factor=10;
> mapred.reduce.parallel.copies=10;
> 
> Thanks
> Abhishek
> 

Reply via email to