Re: ORC String error

Wojciech Indyk Mon, 11 Aug 2014 00:03:08 -0700

Hi!
The workaround: “set hive.optimize.index.filter=false" doesn't work.
Still ArrayIndexOutOfBounds by select max(length(url)) query.
Kindly regards
Wojciech Indyk



2014-08-11 8:49 GMT+02:00 Prasanth Jayachandran <pjayachand...@hortonworks.com>:
> Hi
>
> My suspicion for the error is because of this issue
> https://issues.apache.org/jira/browse/HIVE-6320
> Applying this patch should resolve the issue. The alternative workaround
> would to “set hive.optimize.index.filter=false"
>
> Thanks
> Prasanth Jayachandran
>
> On Aug 10, 2014, at 11:45 PM, Wojciech Indyk <wojciechin...@gmail.com>
> wrote:
>
> Hi!
> I Use CDH 5.1.0 (5.0.3 recently) with Hive 0.12.
> I created ORC table with Snappy compression consisted with some
> integer and string columns. I imported few ~40GB gz files data into
> HDFS, then as external table inserted the external table into ORC
> table.
>
> Unfortunately, when I want to process two string columns (url and
> refererurl) I got an error:
> Error: java.io.IOException: java.io.IOException:
> java.lang.ArrayIndexOutOfBoundsException: 625920 at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:305)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused
> by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException:
> 625920 at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:303)
> ... 11 more Caused by: java.lang.ArrayIndexOutOfBoundsException:
> 625920 at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDictionaryTreeReader.next(RecordReaderImpl.java:1060)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.next(RecordReaderImpl.java:892)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1193)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2240)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:105)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:56)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
> ... 15 more
>
> The error occurs for one mapper per file. E.g. I have two 40GB files
> in my ORC table. Hive creates 300 mappers for a query. Only 2 mappers
> fail with the error above (different array index).
> When I process other columns (both int and string type) a processing
> is finished correctly. I see the error is related to StringTreeReader.
> What is the default delimiter for ORC columns? Maybe the delimiter
> exists in the error string record? But I think it shouldn't cause
> IndexOutOfBounds...
> Is any limitation of string length for ORC? I know there is default
> 128MB stripe for ORC, but I don't expect so huge string as 100MB.
>
> Kindly regards
> Wojciech Indyk
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.

Re: ORC String error

Reply via email to