Re: Hive map join - process a little larger tables with moderatenumber of rows

bejoy_ks Thu, 31 Mar 2011 22:14:54 -0700

Thanks Yongqiang for your reply. I'm running a hive script which has nearly 10 
joins within. From those joins all map joins(9 of them involves one small 
table) involving smaller tables are running fine. Just 1 join is on two larger 
tables and this map join fails, however since the back up task(common join) is 
executed successfully the whole hive job runs to completion successfully.
      In brief my hive job is running successfully now, but I just want to get 
the failed map join as well running instead of the common join being executed. 
I'm curious to see what would be the performance improvement out there with 
this difference in execution.
       To get a map join executed on larger tables do I have to for memory 
parameters with hadoop?
Since my entire task is already running to completion and I want get just a map 
join working, shouldn't altering some hive map join parameters do my job?
Please advise


        
Regards
Bejoy K S

-----Original Message-----
From: yongqiang he <heyongqiang...@gmail.com>
Date: Thu, 31 Mar 2011 16:25:03 
To: <user@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Re: Hive map join - process a little larger tables with moderate
 number of rows

You possibly got a OOM error when processing the small tables. OOM is
a fatal error that can not be controlled by the hive configs. So can
you try to increase your memory setting?

thanks
yongqiang
On Thu, Mar 31, 2011 at 7:25 AM, Bejoy Ks <bejoy...@yahoo.com> wrote:
> Hi Experts
>     I'm currently working with hive 0.7 mostly with JOINS. In all
> permissible cases i'm using map joins by setting the
> hive.auto.convert.join=true  parameter. Usage of local map joins have made a
> considerable performance improvement in hive queries.I have used this local
> map join only on the default set of hive configuration parameters now i'd
> try to dig more deeper into this. Want to try out this local map join on
> little bigger tables with more no of rows. Given below is a failure log of
> one of my local map tasks and in turn executing its back up common join task
>
> 2011-03-31 09:56:54     Starting to launch local task to process map
> join;      maximum memory = 932118528
> 2011-03-31 09:56:57     Processing rows:        200000  Hashtable size:
> 199999  Memory usage:   115481024       rate:   0.124
> 2011-03-31 09:57:00     Processing rows:        300000  Hashtable size:
> 299999  Memory usage:   169344064       rate:   0.182
> 2011-03-31 09:57:03     Processing rows:        400000  Hashtable size:
> 399999  Memory usage:   232132792       rate:   0.249
> 2011-03-31 09:57:06     Processing rows:        500000  Hashtable size:
> 499999  Memory usage:   282338544       rate:   0.303
> 2011-03-31 09:57:10     Processing rows:        600000  Hashtable size:
> 599999  Memory usage:   336738640       rate:   0.361
> 2011-03-31 09:57:14     Processing rows:        700000  Hashtable size:
> 699999  Memory usage:   391117888       rate:   0.42
> 2011-03-31 09:57:22     Processing rows:        800000  Hashtable size:
> 799999  Memory usage:   453906496       rate:   0.487
> 2011-03-31 09:57:27     Processing rows:        900000  Hashtable size:
> 899999  Memory usage:   508306552       rate:   0.545
> 2011-03-31 09:57:34     Processing rows:        1000000 Hashtable size:
> 999999  Memory usage:   562706496       rate:   0.604
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapredLocalTask
> ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.MapRedTask
> Launching Job 4 out of 6
>
>
> Here i"d like to make this local map task running, for the same i tried
> setting the following hive parameters as
> hive -f  HiveJob.txt -hiveconf hive.mapjoin.maxsize=1000000 -hiveconf
> hive.mapjoin.smalltable.filesize=40000000 -hiveconf
> hive.auto.convert.join=true
> Butting setting the two config parameters doesn't make my local map task
> proceed beyond this stage.  I didn't try out
> overriding the hive.mapjoin.localtask.max.memory.usage=0.90 because from my
> task log shows that the memory usage rate is just 0.604, so i assume setting
> the same with a larger value wont cater to a solution in my case.Could some
> one please guide me what are the actual parameters and the values I should
> set to get things rolling.
>
> Thank You
>
> Regards
> Bejoy.K.S
>
>

Re: Hive map join - process a little larger tables with moderatenumber of rows

Reply via email to