Re: LLAP can't read ORC ZLIB files from S3

Owen O'Malley Thu, 25 Jun 2020 10:22:13 -0700

Actually, it looks like LLAP is trying to get the ByteBuffer array from a
direct byte buffer. Turning off direct byte buffers on read should fix the
problem.


.. Owen

On Thu, Jun 25, 2020 at 7:27 AM Aaron Grubb <aaron.gr...@clearpier.com>
wrote:

> This appears to have been caused by orc.write.variable.length.blocks=true
> which I had set for HDFS-based tables. Setting this to false and inserting
> data into the S3 table appears to have fixed this problem.
>
>
>
> *From:* Aaron Grubb <aaron.gr...@clearpier.com>
> *Sent:* Wednesday, June 24, 2020 4:04 PM
> *To:* user@hive.apache.org
> *Subject:* LLAP can't read ORC ZLIB files from S3
>
>
>
> Hello everyone,
>
>
>
> I’m encountering an error that I can’t find any information on. I’ve
> inserted data into a table with storage in S3 in ORC ZLIB format. I can
> query this data directly without issues. Running a query that requires LLAP
> causes the following error:
>
>
>
> java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
> java.io.IOException: java.lang.UnsupportedOperationException
>
>         at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>
>         at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>
>         at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>
>         at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>
>         at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>         at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>
>         at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>
>         at
> org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>
>         at
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
>         at java.lang.Thread.run(Thread.java:748)
>
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.io.IOException: java.io.IOException:
> java.lang.UnsupportedOperationException
>
>         at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80)
>
>         at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
>
>         at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>
>         ... 15 more
>
> Caused by: java.io.IOException: java.io.IOException:
> java.lang.UnsupportedOperationException
>
>         at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>
>         at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>
>         at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
>
>         at
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
>
>         at
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
>
>         at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
>
>         at
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
>
>         at
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
>
>         at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
>
>         ... 17 more
>
> Caused by: java.io.IOException: java.lang.UnsupportedOperationException
>
>         at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readIndexStreams(EncodedReaderImpl.java:1954)
>
>         at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:384)
>
>         at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:263)
>
>         at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:260)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>
>         at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:260)
>
>         at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:109)
>
>         ... 6 more
>
> Caused by: java.lang.UnsupportedOperationException
>
>         at java.nio.ByteBuffer.array(ByteBuffer.java:994)
>
>         at org.apache.orc.impl.ZlibCodec.decompress(ZlibCodec.java:94)
>
>         at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.decompressChunk(EncodedReaderImpl.java:1283)
>
>         at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:902)
>
>         at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readIndexStreams(EncodedReaderImpl.java:1918)
>
>
>
> This is specific to ORC ZLIB files on S3 being processed through LLAP. I
> can query other types of files in S3 through LLAP, I can query the ORC ZLIB
> data on S3 directly (select * from orc_zlib_on_s3_table limit 10) and I can
> execute the same query that fails in LLAP in Native Tez containers. Does
> anyone have any suggestions as to what the problem might be or how to debug
> it?
>
>
>
> Thanks,
>
> Aaron
>

Re: LLAP can't read ORC ZLIB files from S3

Reply via email to