Re: ORC ACID table returning Array Index Out of Bounds

Aviral Agarwal Thu, 15 Feb 2018 18:11:55 -0800

Hive version is 1.2.1000.2.6.1.0-0129 ( HDP 2.6.1.0)

For now I have mitigated the problem by recreating the table. So, I don't
have the relevant ORC files right now.


Also, I am curious, how would "*hive.acid.key.index*" help in debugging
this problem ?

I was going through the source code and it seems the following line is the
problem:

/**
 * Find the key range for bucket files.
 * @param reader the reader
 * @param options the options for reading with
 * @throws IOException
 */
private void discoverKeyBounds(Reader reader,
                               Reader.Options options) throws IOException {
  RecordIdentifier[] keyIndex = OrcRecordUpdater.parseKeyIndex(reader);
  long offset = options.getOffset();
  long maxOffset = options.getMaxOffset();
  int firstStripe = 0;
  int stripeCount = 0;
  boolean isTail = true;
  List<StripeInformation> stripes = reader.getStripes();
  for(StripeInformation stripe: stripes) {
    if (offset > stripe.getOffset()) {
      firstStripe += 1;
    } else if (maxOffset > stripe.getOffset()) {
      stripeCount += 1;
    } else {
      isTail = false;
      break;
    }
  }
  if (firstStripe != 0) {
    minKey = keyIndex[firstStripe - 1];
  }
  if (!isTail) {
    maxKey = keyIndex[firstStripe + stripeCount - 1];
  }
}

If this is still an open issue I would like to submit a patch to it.
Let me know how can I further debug this issue.

Thanks,
Aviral Agarwal

On Feb 15, 2018 23:10, "Eugene Koifman" <[email protected]> wrote:

> What version of Hive is this?
>
>
>
> Can you isolate this to a specific partition?
>
>
>
> The table/partition you are reading should have a directory called base_x/
> with several bucket_0000N files.  (if you see more than 1 base_x, take one
> with highest x)
>
>
>
> Each bucket_0000N should have a “*hive.acid.key.index*” property in user
> metadata section of ORC footer.
>
> Could you share the value of this property?
>
>
>
> You can use orcfiledump (https://cwiki.apache.org/conf
> luence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORC
> FileDumpUtility) for this but it requires https://issues.apache.org/jira
> /browse/ORC-223.
>
>
>
> Thanks,
>
> Eugene
>
>
>
>
>
> *From: *Aviral Agarwal <[email protected]>
> *Reply-To: *"[email protected]" <[email protected]>
> *Date: *Thursday, February 15, 2018 at 2:08 AM
> *To: *"[email protected]" <[email protected]>
> *Subject: *ORC ACID table returning Array Index Out of Bounds
>
>
>
> Hi guys,
>
>
>
> I am running into the following error when querying a ACID table :
>
>
>
> Caused by: java.lang.RuntimeException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 8
>
>         at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
>
>         at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:135)
>
>         at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:101)
>
>         at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:149)
>
>         at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80)
>
>         at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:674)
>
>         at 
> org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:633)
>
>         at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
>
>         at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
>
>         at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:405)
>
>         at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:124)
>
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
>
>         ... 14 more
>
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 8
>
>         at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>
>         at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>
>         at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:253)
>
>         at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
>
>         ... 25 more
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 8
>
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:378)
>
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:447)
>
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1436)
>
>         at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1323)
>
>         at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:251)
>
>         ... 26 more
>
>
>
>
> Any help would be appreciated.
>
>
> Regards,
>
> Aviral Agarwal
>

Re: ORC ACID table returning Array Index Out of Bounds

Reply via email to