2018-09-17 11:20:26,404 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error
running child : java.lang.OutOfMemoryError: Java heap space
at
com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:864)
at
com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.<init>(OrcProto.java:1331)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.<init>(OrcProto.java:1281)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1374)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1369)
at
com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.<init>(OrcProto.java:4897)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.<init>(OrcProto.java:4813)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:5005)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:5000)
at
com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer.<init>(OrcProto.java:15836)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer.<init>(OrcProto.java:15744)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer$1.parsePartialFrom(OrcProto.java:15886)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer$1.parsePartialFrom(OrcProto.java:15881)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:89)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:95)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at
org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer.parseFrom(OrcProto.java:16247)
at
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFooter(ReaderImpl.java:459)
at
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:438)
at
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:319)
at
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241)
at
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:480)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1546)
at
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:655)
at
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:633)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
________________________________
From: Owen O'Malley <[email protected]>
Sent: Monday, September 17, 2018 11:28:43 AM
To: [email protected]
Subject: Re: Hive Compaction OOM
Shawn,
Can you provide the stack trace that you get with the OOM?
Thanks,
Owen
On Mon, Sep 17, 2018 at 9:27 AM Prasanth Jayachandran
<[email protected]<mailto:[email protected]>> wrote:
Hi Shawn
You might be running into issues related to huge protobuf objects from huge
string columns. Without
https://issues.apache.org/jira/plugins/servlet/mobile#issue/ORC-203 there isn’t
an option other than providing sufficiently large memory. If you can reload the
data with binary type that should help avoid this issue.
Thanks
Prasanth
On Mon, Sep 17, 2018 at 9:10 AM -0700, "Shawn Weeks"
<[email protected]<mailto:[email protected]>> wrote:
Let me start off by saying I've backed myself into a corner and would rather
not reprocess the data if possible. I have a Hive Transactional table in Hive
1.2.1 H that was loaded via NiFi Hive Streaming with a fairly large String
column containing XML Documents. Awful I know and I'm working on changing how
the data get's loaded. But I've got this table with so many deltas that the
Hive Compaction runs out of memory and any queries on the table run out of
memory. Any ideas on how I might get the data out of the table and split it
into more buckets or something?
Thanks
Shawn Weeks