Hive version is 1.2.1000.2.6.1.0-0129 ( HDP 2.6.1.0)
For now I have mitigated the problem by recreating the table. So, I don't
have the relevant ORC files right now.
Also, I am curious, how would "*hive.acid.key.index*" help in debugging
this problem ?
I was going through the source code and it seems the following line is the
problem:
/**
* Find the key range for bucket files.
* @param reader the reader
* @param options the options for reading with
* @throws IOException
*/
private void discoverKeyBounds(Reader reader,
Reader.Options options) throws IOException {
RecordIdentifier[] keyIndex = OrcRecordUpdater.parseKeyIndex(reader);
long offset = options.getOffset();
long maxOffset = options.getMaxOffset();
int firstStripe = 0;
int stripeCount = 0;
boolean isTail = true;
List<StripeInformation> stripes = reader.getStripes();
for(StripeInformation stripe: stripes) {
if (offset > stripe.getOffset()) {
firstStripe += 1;
} else if (maxOffset > stripe.getOffset()) {
stripeCount += 1;
} else {
isTail = false;
break;
}
}
if (firstStripe != 0) {
minKey = keyIndex[firstStripe - 1];
}
if (!isTail) {
maxKey = keyIndex[firstStripe + stripeCount - 1];
}
}
If this is still an open issue I would like to submit a patch to it.
Let me know how can I further debug this issue.
Thanks,
Aviral Agarwal
On Feb 15, 2018 23:10, "Eugene Koifman" <[email protected]> wrote:
> What version of Hive is this?
>
>
>
> Can you isolate this to a specific partition?
>
>
>
> The table/partition you are reading should have a directory called base_x/
> with several bucket_0000N files. (if you see more than 1 base_x, take one
> with highest x)
>
>
>
> Each bucket_0000N should have a “*hive.acid.key.index*” property in user
> metadata section of ORC footer.
>
> Could you share the value of this property?
>
>
>
> You can use orcfiledump (https://cwiki.apache.org/conf
> luence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORC
> FileDumpUtility) for this but it requires https://issues.apache.org/jira
> /browse/ORC-223.
>
>
>
> Thanks,
>
> Eugene
>
>
>
>
>
> *From: *Aviral Agarwal <[email protected]>
> *Reply-To: *"[email protected]" <[email protected]>
> *Date: *Thursday, February 15, 2018 at 2:08 AM
> *To: *"[email protected]" <[email protected]>
> *Subject: *ORC ACID table returning Array Index Out of Bounds
>
>
>
> Hi guys,
>
>
>
> I am running into the following error when querying a ACID table :
>
>
>
> Caused by: java.lang.RuntimeException: java.io.IOException:
> java.lang.ArrayIndexOutOfBoundsException: 8
>
> at
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
>
> at
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135)
>
> at
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
>
> at
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
>
> at
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
>
> at
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674)
>
> at
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633)
>
> at
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
>
> at
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
>
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405)
>
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124)
>
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
>
> ... 14 more
>
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8
>
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:253)
>
> at
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
>
> ... 25 more
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 8
>
> at
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:378)
>
> at
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:447)
>
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1436)
>
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1323)
>
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
>
> ... 26 more
>
>
>
>
> Any help would be appreciated.
>
>
> Regards,
>
> Aviral Agarwal
>