Re: NegativeArraySizeException in table join

Ashutosh Chauhan Thu, 15 Jan 2015 20:01:01 -0800

Interesting. You mean to say there is no bug in Hive, but in some other
component (yarn / MR). Stack trace seems to indicate there is something
going on in Hive side as well. Granted stack trace of 9324 is not identical
to yours, but both points problem in similar area.


On Thu, Jan 15, 2015 at 7:53 PM, Guodong Wang <wangg...@gmail.com> wrote:

> Hi Ashutosh,
>
> Thanks for your reply.
>
> Not sure if HIVE-9324 is the same issue we met.
> We found it is a bug in CDH when using MR1 with hive 0.13.1. This bug does
> not exist when using yarn with 0.13.1.
>
>
>
> Guodong
>
> On Fri, Jan 16, 2015 at 1:21 AM, Ashutosh Chauhan <hashut...@apache.org>
> wrote:
>
>> Seems like you are hitting into :
>> https://issues.apache.org/jira/browse/HIVE-9324
>>
>> On Thu, Jan 15, 2015 at 1:53 AM, Guodong Wang <wangg...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am using hive 0.13.1 and currently I am blocked by a bug when joining
>>> 2 tables. Here is the sample query.
>>>
>>> INSERT OVERWRITE TABLE test_archive PARTITION(data='2015-01-17', name,
>>> type)
>>> SELECT COALESCE(b.resource_id, a.id) AS id,
>>>        a.timstamp,
>>>        a.payload,
>>>        a.name,
>>>        a.type
>>> FROM test_data a LEFT OUTER JOIN id_mapping b on a.id = b.id
>>> WHERE a.date='2015-01-17'
>>>     AND a.name IN ('a', 'b', 'c')
>>>     AND a.type <= 14;
>>>
>>> It turns out that when there are more than 25000 joins rows on a
>>> specific id, hive MR job fails, throwing NegativeArraySizeException.
>>>
>>> Here is the stack trace
>>>
>>> 2015-01-15 14:38:42,693 ERROR 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
>>> java.lang.NegativeArraySizeException
>>>     at 
>>> org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
>>>     at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
>>>     at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
>>>     at 
>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>>>     at 
>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>>>     at 
>>> org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244)
>>>     at 
>>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228)
>>>     at 
>>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
>>>     at 
>>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>>>     at 
>>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>     at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>> 2015-01-15 14:38:42,707 FATAL ExecReducer: 
>>> org.apache.hadoop.hive.ql.metadata.HiveException: 
>>> org.apache.hadoop.hive.ql.metadata.HiveException: 
>>> java.lang.NegativeArraySizeException
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>>>     at 
>>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>>     at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
>>> java.lang.NegativeArraySizeException
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385)
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
>>>     ... 11 more
>>> Caused by: java.lang.NegativeArraySizeException
>>>     at 
>>> org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
>>>     at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
>>>     at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
>>>     at 
>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>>>     at 
>>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>>>     at 
>>> org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244)
>>>     at 
>>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228)
>>>     at 
>>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
>>>     at 
>>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
>>>     at 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
>>>     ... 12 more
>>>
>>>
>>> I found that when the exceptions are thrown. There is a log like this
>>>
>>> 2015-01-15 14:38:42,045 INFO 
>>> org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer 
>>> created temp file 
>>> /local/data0/mapred/taskTracker/ubuntu/jobcache/job_201412171918_0957/attempt_201412171918_0957_r_000000_0/work/tmp/hive-rowcontainer5023288010679723993/RowContainer5093924743042924240.tmp
>>>
>>>
>>> Looks like when RowContainer collects more than 25000 row records.
>>> It will flush out the block to local disk. But it can not read
>>> these blocks out.
>>>
>>> Any help is really appreciated!
>>>
>>>
>>>
>>> Guodong
>>>
>>
>>
>

Re: NegativeArraySizeException in table join

Reply via email to