Seems like you are hitting into :
https://issues.apache.org/jira/browse/HIVE-9324

On Thu, Jan 15, 2015 at 1:53 AM, Guodong Wang <wangg...@gmail.com> wrote:

> Hi,
>
> I am using hive 0.13.1 and currently I am blocked by a bug when joining 2
> tables. Here is the sample query.
>
> INSERT OVERWRITE TABLE test_archive PARTITION(data='2015-01-17', name,
> type)
> SELECT COALESCE(b.resource_id, a.id) AS id,
>        a.timstamp,
>        a.payload,
>        a.name,
>        a.type
> FROM test_data a LEFT OUTER JOIN id_mapping b on a.id = b.id
> WHERE a.date='2015-01-17'
>     AND a.name IN ('a', 'b', 'c')
>     AND a.type <= 14;
>
> It turns out that when there are more than 25000 joins rows on a specific
> id, hive MR job fails, throwing NegativeArraySizeException.
>
> Here is the stack trace
>
> 2015-01-15 14:38:42,693 ERROR 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
> java.lang.NegativeArraySizeException
>       at 
> org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
>       at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
>       at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228)
>       at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
>       at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>       at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
>       at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>       at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>       at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 2015-01-15 14:38:42,707 FATAL ExecReducer: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NegativeArraySizeException
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
>       at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
>       at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>       at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>       at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NegativeArraySizeException
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
>       ... 11 more
> Caused by: java.lang.NegativeArraySizeException
>       at 
> org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
>       at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
>       at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228)
>       at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
>       at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
>       ... 12 more
>
>
> I found that when the exceptions are thrown. There is a log like this
>
> 2015-01-15 14:38:42,045 INFO 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
> temp file 
> /local/data0/mapred/taskTracker/ubuntu/jobcache/job_201412171918_0957/attempt_201412171918_0957_r_000000_0/work/tmp/hive-rowcontainer5023288010679723993/RowContainer5093924743042924240.tmp
>
>
> Looks like when RowContainer collects more than 25000 row records.
> It will flush out the block to local disk. But it can not read
> these blocks out.
>
> Any help is really appreciated!
>
>
>
> Guodong
>

Reply via email to