Re: Hive query optimization

Igor Tatarinov Mon, 23 Jul 2012 21:51:38 -0700

Here is my 2 cents.
The parameters you are looking at are quite specific. Unless you know what
you are doing it might be hard to set them exactly right and they shouldn't
make that much of a difference - again unless you know the specifics.

What worked for me is using a single "wave" of reducers. Basically, you
want to set the number of reduce tasks to be equal to the number of reduce
slots (assuming your job will run by itself).

It might also help to re-arrange your joins so that the larger table is
streamed (https://cwiki.apache.org/Hive/languagemanual-joins.html).
That seems especially important with map joins since those fail if there is
not enough memory and have to be rerun as regular joins.

Hope this helps.

On Mon, Jul 23, 2012 at 6:54 PM, abhiTowson cal
<abhishek.dod...@gmail.com>wrote:

> Hi all,
>
> Some queries in hive are executing for too long.So i have overriden
> some parameters in hive, for some querys performance increased rapidly
> when i overriden this properities  for some querys no change in
> performance.Can any one you
> tell me any other optimizations in hive apart from partitions and
> buckets,
>
> set io.sort.mb=512;
> set io.sort.factor=100;
> set mapred.reduce.parallel.copies=40;
> set hive.map.aggr =true;
> set hive.exec.parallel=true;
> set hive.groupby.skewindata=true;
> set mapred.job.reuse.jvm.num.tasks=-1;
>
> default values were
>
> io.sort.mb=256;
> io.sort.factor=10;
> mapred.reduce.parallel.copies=10;
>
> Thanks
> Abhishek
>

Re: Hive query optimization

Reply via email to