Re: map join and OOM

2010-02-25 Thread Ning Zhang
org" > 发送日期: 2010/2/25 (周四) 1:19:16 下午 > 主 题: Re: map join and OOM > > > Edward, > > Multi-table joins are practical and we do a lot of these here. If the OOM > exception was thrown from a regular reduce-side join, it may be caused by > skew

Re: map join and OOM

2010-02-25 Thread Ning Zhang
This seems to be a Hadoop issue. We didn't run Hadoop 0.18 or 0.19 in house. Can you try on Hadoop 0.17 or 0.20 if possible? Thanks, Ning On Feb 25, 2010, at 11:48 AM, Edward Capriolo wrote: > On Thu, Feb 25, 2010 at 1:19 PM, Ning Zhang wrote: >> Edward, >> Multi-table joins are practical and w

Re: map join and OOM

2010-02-25 Thread Gang Luo
o the join there? Thanks. -Gang 发件人: Ning Zhang 收件人: "hive-user@hadoop.apache.org" 发送日期: 2010/2/25 (周四) 1:19:16 下午 主 题: Re: map join and OOM Edward, Multi-table joins are practical and we do a lot of these here. If the OOM exception was thrown f

Re: map join and OOM

2010-02-25 Thread Ning Zhang
Edward, Multi-table joins are practical and we do a lot of these here. If the OOM exception was thrown from a regular reduce-side join, it may be caused by skewness in your join keys. >From branch-0.5 and forward, you will have a parameter hive.join.cache.size to >control how many rows you wa

Re: map join and OOM

2010-02-25 Thread Edward Capriolo
On Thu, Feb 25, 2010 at 1:19 PM, Ning Zhang wrote: > Edward, > Multi-table joins are practical and we do a lot of these here.  If the OOM > exception was thrown from a regular reduce-side join, it may be caused by > skewness in your join keys. > From branch-0.5 and forward, you will have a paramet

Re: map join and OOM

2010-02-25 Thread Edward Capriolo
t eats too much memory. Do > you get a good balance here? > > Thanks. > > -Gang > > > > - 原始邮件 > 发件人: Yongqiang He > 收件人: hive-user@hadoop.apache.org > 发送日期: 2010/2/19 (周五) 12:39:30 上午 > 主 题: Re: map join and OOM > > Actually Hive-917 only help

Re: map join and OOM

2010-02-19 Thread Gang Luo
- 原始邮件 发件人: Yongqiang He 收件人: hive-user@hadoop.apache.org 发送日期: 2010/2/19 (周五) 12:39:30 上午 主 题: Re: map join and OOM Actually Hive-917 only help when the joining tables are bucketed. With hive-trunk (not sure about 0.5), there will not has OOM anymore in Hive's mapjoin, no matte

Re: map join and OOM

2010-02-18 Thread Yongqiang He
va is not a memory-efficient data structure (Of course, this >> really depend the number of records and the length of each record). I think >> Map Join could only handle very small table (100 mb or so). >> >> -Gang >> >> >> - 原始邮件 >> 发件人: Ed

Re: map join and OOM

2010-02-18 Thread Ning Zhang
> -Gang >> >> >> - 原始邮件 >> 发件人: Edward Capriolo >> 收件人: hive-user@hadoop.apache.org >> 发送日期: 2010/2/18 (周四) 5:45:10 下午 >> 主 题: map join and OOM >> >> I have Hive 4.1-rc2. My query runs in Time taken: 312.956 seconds >> us

Re: map join and OOM

2010-02-18 Thread Edward Capriolo
t; 收件人: hive-user@hadoop.apache.org > 发送日期: 2010/2/18 (周四) 5:45:10 下午 > 主 题: map join and OOM > > I have Hive 4.1-rc2. My query runs in Time taken: 312.956 seconds > using the map/reduce join. I was interested in using mapjoin, I get > an OOM error. > >

Re: map join and OOM

2010-02-18 Thread Gang Luo
structure (Of course, this really depend the number of records and the length of each record). I think Map Join could only handle very small table (100 mb or so). -Gang - 原始邮件 发件人: Edward Capriolo 收件人: hive-user@hadoop.apache.org 发送日期: 2010/2/18 (周四) 5:45:10 下午 主 题: map join and OOM I

Re: map join and OOM

2010-02-18 Thread Zheng Shao
https://issues.apache.org/jira/browse/HIVE-917 might be what you want (suppose both of the tables are already bucketed on the join column). Zheng On Thu, Feb 18, 2010 at 2:53 PM, Ning Zhang wrote: > 1GB of the small table is usually too large for map-side joins. If the raw > data is 1GB, it cou

Re: map join and OOM

2010-02-18 Thread Ning Zhang
1GB of the small table is usually too large for map-side joins. If the raw data is 1GB, it could be 10x larger when it is read into main memory as Java objects. Our default value is 10MB. Another factor to determine whether to use map-side join is the number of rows in the small table. If it i

map join and OOM

2010-02-18 Thread Edward Capriolo
I have Hive 4.1-rc2. My query runs in Time taken: 312.956 seconds using the map/reduce join. I was interested in using mapjoin, I get an OOM error. hive> java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.hadoop.hive.ql.util.jdbm.recman.RecordFile.getNewNode(RecordFile.