Anyone got any ideas on this one? I can consistently reproduce the issue with HDFS - the minute I get the data off HDFS (to a local drive), it all works fine.
Doesn't seem to be a problem with Parquet - more like the HDFS storage engine. On Tue, Jan 6, 2015 at 9:50 AM, Adam Gilmore <dragoncu...@gmail.com> wrote: > The data is okay, because the exact same Parquet directory is working fine > on the local drive, it's just not working when using HDFS. I tried casting > as you said, but that ended up with the exact same problem. > > On Tue, Jan 6, 2015 at 9:49 AM, MapR <sth...@maprtech.com> wrote: > >> Please try casting the colum data type. Also please verify that all the >> column data is satisfying your data type. >> >> Sudhakar Thota >> Sent from my iPhone >> >> > On Jan 5, 2015, at 5:56 AM, Adam Gilmore <dragoncu...@gmail.com> wrote: >> > >> > The actual stack trace is: >> > >> > 2015-01-05 13:48:27,356 [2b5569d5-3771-748d-1390-3a8930d02002:frag:1:12] >> > ERROR o.a.drill.exec.ops.FragmentContext - Fragment Context received >> > failure. >> > org.apache.drill.common.exceptions.DrillRuntimeException: >> > java.io.IOException: can not read class parquet.format.PageHeader: don't >> > know what type: 13 >> > at >> > >> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:427) >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:158) >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema(StreamingAggBatch.java:83) >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130) >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:97) >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:114) >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254) >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> > [na:1.7.0_71] >> > at >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> > [na:1.7.0_71] >> > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] >> > Caused by: java.io.IOException: can not read class >> > parquet.format.PageHeader: don't know what type: 13 >> > at parquet.format.Util.read(Util.java:50) >> > ~[parquet-format-2.1.1-drill-r1.jar:na] >> > at parquet.format.Util.readPageHeader(Util.java:26) >> > ~[parquet-format-2.1.1-drill-r1.jar:na] >> > at >> > >> org.apache.drill.exec.store.parquet.ColumnDataReader.readPageHeader(ColumnDataReader.java:47) >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.store.parquet.columnreaders.PageReader.next(PageReader.java:169) >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.store.parquet.columnreaders.NullableColumnReader.processPages(NullableColumnReader.java:76) >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:366) >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > at >> > >> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:409) >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0] >> > ... 15 common frames omitted >> > Caused by: parquet.org.apache.thrift.protocol.TProtocolException: don't >> > know what type: 13 >> > at >> > >> parquet.org.apache.thrift.protocol.TCompactProtocol.getTType(TCompactProtocol.java:806) >> > ~[parquet-format-2.1.1-drill-r1.jar:na] >> > at >> > >> parquet.org.apache.thrift.protocol.TCompactProtocol.readListBegin(TCompactProtocol.java:536) >> > ~[parquet-format-2.1.1-drill-r1.jar:na] >> > at >> > >> parquet.org.apache.thrift.protocol.TCompactProtocol.readSetBegin(TCompactProtocol.java:547) >> > ~[parquet-format-2.1.1-drill-r1.jar:na] >> > at >> > >> parquet.org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:128) >> > ~[parquet-format-2.1.1-drill-r1.jar:na] >> > at >> > >> parquet.org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:60) >> > ~[parquet-format-2.1.1-drill-r1.jar:na] >> > at parquet.format.PageHeader.read(PageHeader.java:897) >> > ~[parquet-format-2.1.1-drill-r1.jar:na] >> > at parquet.format.Util.read(Util.java:47) >> > ~[parquet-format-2.1.1-drill-r1.jar:na] >> > ... 21 common frames omitted >> > >> > >> >> On Mon, Jan 5, 2015 at 6:26 PM, Adam Gilmore <dragoncu...@gmail.com> >> wrote: >> >> >> >> Hi all, >> >> >> >> I'm trying to do a really simple query on a parquet directory on HDFS. >> >> >> >> This works fine: >> >> >> >> select count(*) from hdfs.warehouse.saleparquet >> >> >> >> However, this fails: >> >> >> >> 0: jdbc:drill:local> select sum(sellprice) from >> hdfs.warehouse.saleparquet; >> >> Query failed: Query failed: Failure while running fragment., You tried >> to >> >> do a batch data read operation when you were in a state of STOP. You >> can >> >> only do this type of operation when you are in a state of OK or >> >> OK_NEW_SCHEMA. [ 92fc8807-220b-466c-bbac-1f524d4251cb on >> >> ip-10-8-1-154.ap-southeast-2.compute.internal:31010 ] >> >> [ 92fc8807-220b-466c-bbac-1f524d4251cb on >> >> ip-10-8-1-154.ap-southeast-2.compute.internal:31010 ] >> >> >> >> >> >> Error: exception while executing query: Failure while executing query. >> >> (state=,code=0) >> >> >> >> Seems like a very simple query. >> >> >> >> Funnily enough, if I copy it off HDFS to the local system and run the >> same >> >> query against the local file, it works fine. Just purely something to >> do >> >> with HDFS. >> >> >> >> Any ideas? I'm running 0.7. >> >> >> > >