Seems like you are hitting into : https://issues.apache.org/jira/browse/HIVE-9324
On Thu, Jan 15, 2015 at 1:53 AM, Guodong Wang <wangg...@gmail.com> wrote: > Hi, > > I am using hive 0.13.1 and currently I am blocked by a bug when joining 2 > tables. Here is the sample query. > > INSERT OVERWRITE TABLE test_archive PARTITION(data='2015-01-17', name, > type) > SELECT COALESCE(b.resource_id, a.id) AS id, > a.timstamp, > a.payload, > a.name, > a.type > FROM test_data a LEFT OUTER JOIN id_mapping b on a.id = b.id > WHERE a.date='2015-01-17' > AND a.name IN ('a', 'b', 'c') > AND a.type <= 14; > > It turns out that when there are more than 25000 joins rows on a specific > id, hive MR job fails, throwing NegativeArraySizeException. > > Here is the stack trace > > 2015-01-15 14:38:42,693 ERROR > org.apache.hadoop.hive.ql.exec.persistence.RowContainer: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144) > at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123) > at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) > at > org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244) > at > org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > 2015-01-15 14:38:42,707 FATAL ExecReducer: > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) > at > org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230) > ... 11 more > Caused by: java.lang.NegativeArraySizeException > at > org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144) > at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123) > at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) > at > org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244) > at > org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360) > ... 12 more > > > I found that when the exceptions are thrown. There is a log like this > > 2015-01-15 14:38:42,045 INFO > org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created > temp file > /local/data0/mapred/taskTracker/ubuntu/jobcache/job_201412171918_0957/attempt_201412171918_0957_r_000000_0/work/tmp/hive-rowcontainer5023288010679723993/RowContainer5093924743042924240.tmp > > > Looks like when RowContainer collects more than 25000 row records. > It will flush out the block to local disk. But it can not read > these blocks out. > > Any help is really appreciated! > > > > Guodong >