org"
> 发送日期: 2010/2/25 (周四) 1:19:16 下午
> 主 题: Re: map join and OOM
>
>
> Edward,
>
> Multi-table joins are practical and we do a lot of these here. If the OOM
> exception was thrown from a regular reduce-side join, it may be caused by
> skew
This seems to be a Hadoop issue. We didn't run Hadoop 0.18 or 0.19 in house.
Can you try on Hadoop 0.17 or 0.20 if possible?
Thanks,
Ning
On Feb 25, 2010, at 11:48 AM, Edward Capriolo wrote:
> On Thu, Feb 25, 2010 at 1:19 PM, Ning Zhang wrote:
>> Edward,
>> Multi-table joins are practical and w
o the join there?
Thanks.
-Gang
发件人: Ning Zhang
收件人: "hive-user@hadoop.apache.org"
发送日期: 2010/2/25 (周四) 1:19:16 下午
主 题: Re: map join and OOM
Edward,
Multi-table joins are practical and we do a lot of these here. If the OOM
exception was thrown f
Edward,
Multi-table joins are practical and we do a lot of these here. If the OOM
exception was thrown from a regular reduce-side join, it may be caused by
skewness in your join keys.
>From branch-0.5 and forward, you will have a parameter hive.join.cache.size to
>control how many rows you wa
On Thu, Feb 25, 2010 at 1:19 PM, Ning Zhang wrote:
> Edward,
> Multi-table joins are practical and we do a lot of these here. If the OOM
> exception was thrown from a regular reduce-side join, it may be caused by
> skewness in your join keys.
> From branch-0.5 and forward, you will have a paramet
t eats too much memory. Do
> you get a good balance here?
>
> Thanks.
>
> -Gang
>
>
>
> - 原始邮件
> 发件人: Yongqiang He
> 收件人: hive-user@hadoop.apache.org
> 发送日期: 2010/2/19 (周五) 12:39:30 上午
> 主 题: Re: map join and OOM
>
> Actually Hive-917 only help
- 原始邮件
发件人: Yongqiang He
收件人: hive-user@hadoop.apache.org
发送日期: 2010/2/19 (周五) 12:39:30 上午
主 题: Re: map join and OOM
Actually Hive-917 only help when the joining tables are bucketed.
With hive-trunk (not sure about 0.5), there will not has OOM anymore in
Hive's mapjoin, no matte
va is not a memory-efficient data structure (Of course, this
>> really depend the number of records and the length of each record). I think
>> Map Join could only handle very small table (100 mb or so).
>>
>> -Gang
>>
>>
>> - 原始邮件
>> 发件人: Ed
> -Gang
>>
>>
>> - 原始邮件
>> 发件人: Edward Capriolo
>> 收件人: hive-user@hadoop.apache.org
>> 发送日期: 2010/2/18 (周四) 5:45:10 下午
>> 主 题: map join and OOM
>>
>> I have Hive 4.1-rc2. My query runs in Time taken: 312.956 seconds
>> us
t; 收件人: hive-user@hadoop.apache.org
> 发送日期: 2010/2/18 (周四) 5:45:10 下午
> 主 题: map join and OOM
>
> I have Hive 4.1-rc2. My query runs in Time taken: 312.956 seconds
> using the map/reduce join. I was interested in using mapjoin, I get
> an OOM error.
>
>
structure (Of course, this
really depend the number of records and the length of each record). I think Map
Join could only handle very small table (100 mb or so).
-Gang
- 原始邮件
发件人: Edward Capriolo
收件人: hive-user@hadoop.apache.org
发送日期: 2010/2/18 (周四) 5:45:10 下午
主 题: map join and OOM
I
https://issues.apache.org/jira/browse/HIVE-917 might be what you want
(suppose both of the tables are already bucketed on the join column).
Zheng
On Thu, Feb 18, 2010 at 2:53 PM, Ning Zhang wrote:
> 1GB of the small table is usually too large for map-side joins. If the raw
> data is 1GB, it cou
1GB of the small table is usually too large for map-side joins. If the raw data
is 1GB, it could be 10x larger when it is read into main memory as Java
objects. Our default value is 10MB.
Another factor to determine whether to use map-side join is the number of rows
in the small table. If it i
I have Hive 4.1-rc2. My query runs in Time taken: 312.956 seconds
using the map/reduce join. I was interested in using mapjoin, I get
an OOM error.
hive>
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.hadoop.hive.ql.util.jdbm.recman.RecordFile.getNewNode(RecordFile.
14 matches
Mail list logo