Hi! The workaround: “set hive.optimize.index.filter=false" doesn't work. Still ArrayIndexOutOfBounds by select max(length(url)) query. Kindly regards Wojciech Indyk
2014-08-11 8:49 GMT+02:00 Prasanth Jayachandran <pjayachand...@hortonworks.com>: > Hi > > My suspicion for the error is because of this issue > https://issues.apache.org/jira/browse/HIVE-6320 > Applying this patch should resolve the issue. The alternative workaround > would to “set hive.optimize.index.filter=false" > > Thanks > Prasanth Jayachandran > > On Aug 10, 2014, at 11:45 PM, Wojciech Indyk <wojciechin...@gmail.com> > wrote: > > Hi! > I Use CDH 5.1.0 (5.0.3 recently) with Hive 0.12. > I created ORC table with Snappy compression consisted with some > integer and string columns. I imported few ~40GB gz files data into > HDFS, then as external table inserted the external table into ORC > table. > > Unfortunately, when I want to process two string columns (url and > refererurl) I got an error: > Error: java.io.IOException: java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException: 625920 at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:305) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:415) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused > by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: > 625920 at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:303) > ... 11 more Caused by: java.lang.ArrayIndexOutOfBoundsException: > 625920 at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDictionaryTreeReader.next(RecordReaderImpl.java:1060) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.next(RecordReaderImpl.java:892) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1193) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2240) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:105) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:56) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274) > ... 15 more > > The error occurs for one mapper per file. E.g. I have two 40GB files > in my ORC table. Hive creates 300 mappers for a query. Only 2 mappers > fail with the error above (different array index). > When I process other columns (both int and string type) a processing > is finished correctly. I see the error is related to StringTreeReader. > What is the default delimiter for ORC columns? Maybe the delimiter > exists in the error string record? But I think it shouldn't cause > IndexOutOfBounds... > Is any limitation of string length for ORC? I know there is default > 128MB stripe for ORC, but I don't expect so huge string as 100MB. > > Kindly regards > Wojciech Indyk > > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader of > this message is not the intended recipient, you are hereby notified that any > printing, copying, dissemination, distribution, disclosure or forwarding of > this communication is strictly prohibited. If you have received this > communication in error, please contact the sender immediately and delete it > from your system. Thank You.