Re: Can't query parquet on HDFS

Adam Gilmore Wed, 07 Jan 2015 00:59:16 -0800

P.S. For a more extreme example (1M rows) that returns:

Query failed: Query failed: Failure while running fragment., You tried to
do a batch data read operation when you were in a state of STOP.  You can
only do this type of operation when you are in a state of OK or
OK_NEW_SCHEMA. [ 91b9e166-d185-4101-b466-1dd231808a9d on
ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]
[ 91b9e166-d185-4101-b466-1dd231808a9d on
ip-10-8-1-70.ap-southeast-2.compute.internal:31010 ]


See:

https://www.dropbox.com/s/akdyfxb98q5adxg/saletest3.tgz?dl=0

On Wed, Jan 7, 2015 at 6:42 PM, Adam Gilmore <dragoncu...@gmail.com> wrote:

> I can definitely put it up somewhere - it's only 72kb (the Parquet file).
> I'm using Hadoop 2.4.0 running on Amazon EMR.  If I get and put it back
> onto HDFS, it still has the same problem, unfortunately.
>
> https://www.dropbox.com/s/nzbg8986mt5t8md/saletest2.tgz?dl=0
>
> I notice in the source that there are different classes for reading
> Parquet from HDFS as opposed to DFS, so I imagine there could be an issue
> there.
>
> I also see the 0.7.0 download is compiled against Hadoop 2.4.1 jars, and I
> imagine it's all backwards compatible so it doesn't seem like a version
> issue.
>
> Let me know how I can assist in reproducing.
>
> On Wed, Jan 7, 2015 at 3:56 PM, Jacques Nadeau <jacq...@apache.org> wrote:
>
>> Nothing is immediately coming to mind.  Out of curiosity, does it still
>> have this problem if you copy the local file back on HDFS and then query
>> it?
>>
>> What version of HDFS are using?  Is the file something you can share
>> privately or publically or is too large?
>>
>> thanks,
>> Jacques
>>
>> On Tue, Jan 6, 2015 at 9:29 PM, Adam Gilmore <dragoncu...@gmail.com>
>> wrote:
>>
>> > Anyone got any ideas on this one?  I can consistently reproduce the
>> issue
>> > with HDFS - the minute I get the data off HDFS (to a local drive), it
>> all
>> > works fine.
>> >
>> > Doesn't seem to be a problem with Parquet - more like the HDFS storage
>> > engine.
>> >
>> > On Tue, Jan 6, 2015 at 9:50 AM, Adam Gilmore <dragoncu...@gmail.com>
>> > wrote:
>> >
>> > > The data is okay, because the exact same Parquet directory is working
>> > fine
>> > > on the local drive, it's just not working when using HDFS.  I tried
>> > casting
>> > > as you said, but that ended up with the exact same problem.
>> > >
>> > > On Tue, Jan 6, 2015 at 9:49 AM, MapR <sth...@maprtech.com> wrote:
>> > >
>> > >> Please try casting the colum data type. Also please verify that all
>> the
>> > >> column data is satisfying your data type.
>> > >>
>> > >> Sudhakar Thota
>> > >> Sent from my iPhone
>> > >>
>> > >> > On Jan 5, 2015, at 5:56 AM, Adam Gilmore <dragoncu...@gmail.com>
>> > wrote:
>> > >> >
>> > >> > The actual stack trace is:
>> > >> >
>> > >> > 2015-01-05 13:48:27,356
>> > [2b5569d5-3771-748d-1390-3a8930d02002:frag:1:12]
>> > >> > ERROR o.a.drill.exec.ops.FragmentContext - Fragment Context
>> received
>> > >> > failure.
>> > >> > org.apache.drill.common.exceptions.DrillRuntimeException:
>> > >> > java.io.IOException: can not read class parquet.format.PageHeader:
>> > don't
>> > >> > know what type: 13
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:427)
>> > >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:158)
>> > >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>> > >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
>> > >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
>> > >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema(StreamingAggBatch.java:83)
>> > >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130)
>> > >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>> > >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67)
>> > >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:97)
>> > >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57)
>> > >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:114)
>> > >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
>> > >> > [drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> > >> > [na:1.7.0_71]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> > >> > [na:1.7.0_71]
>> > >> >        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
>> > >> > Caused by: java.io.IOException: can not read class
>> > >> > parquet.format.PageHeader: don't know what type: 13
>> > >> >        at parquet.format.Util.read(Util.java:50)
>> > >> > ~[parquet-format-2.1.1-drill-r1.jar:na]
>> > >> >        at parquet.format.Util.readPageHeader(Util.java:26)
>> > >> > ~[parquet-format-2.1.1-drill-r1.jar:na]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.store.parquet.ColumnDataReader.readPageHeader(ColumnDataReader.java:47)
>> > >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.store.parquet.columnreaders.PageReader.next(PageReader.java:169)
>> > >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.store.parquet.columnreaders.NullableColumnReader.processPages(NullableColumnReader.java:76)
>> > >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:366)
>> > >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:409)
>> > >> > ~[drill-java-exec-0.7.0-rebuffed.jar:0.7.0]
>> > >> >        ... 15 common frames omitted
>> > >> > Caused by: parquet.org.apache.thrift.protocol.TProtocolException:
>> > don't
>> > >> > know what type: 13
>> > >> >        at
>> > >> >
>> > >>
>> >
>> parquet.org.apache.thrift.protocol.TCompactProtocol.getTType(TCompactProtocol.java:806)
>> > >> > ~[parquet-format-2.1.1-drill-r1.jar:na]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> parquet.org.apache.thrift.protocol.TCompactProtocol.readListBegin(TCompactProtocol.java:536)
>> > >> > ~[parquet-format-2.1.1-drill-r1.jar:na]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> parquet.org.apache.thrift.protocol.TCompactProtocol.readSetBegin(TCompactProtocol.java:547)
>> > >> > ~[parquet-format-2.1.1-drill-r1.jar:na]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> parquet.org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:128)
>> > >> > ~[parquet-format-2.1.1-drill-r1.jar:na]
>> > >> >        at
>> > >> >
>> > >>
>> >
>> parquet.org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:60)
>> > >> > ~[parquet-format-2.1.1-drill-r1.jar:na]
>> > >> >        at parquet.format.PageHeader.read(PageHeader.java:897)
>> > >> > ~[parquet-format-2.1.1-drill-r1.jar:na]
>> > >> >        at parquet.format.Util.read(Util.java:47)
>> > >> > ~[parquet-format-2.1.1-drill-r1.jar:na]
>> > >> >        ... 21 common frames omitted
>> > >> >
>> > >> >
>> > >> >> On Mon, Jan 5, 2015 at 6:26 PM, Adam Gilmore <
>> dragoncu...@gmail.com>
>> > >> wrote:
>> > >> >>
>> > >> >> Hi all,
>> > >> >>
>> > >> >> I'm trying to do a really simple query on a parquet directory on
>> > HDFS.
>> > >> >>
>> > >> >> This works fine:
>> > >> >>
>> > >> >> select count(*) from hdfs.warehouse.saleparquet
>> > >> >>
>> > >> >> However, this fails:
>> > >> >>
>> > >> >> 0: jdbc:drill:local> select sum(sellprice) from
>> > >> hdfs.warehouse.saleparquet;
>> > >> >> Query failed: Query failed: Failure while running fragment., You
>> > tried
>> > >> to
>> > >> >> do a batch data read operation when you were in a state of STOP.
>> You
>> > >> can
>> > >> >> only do this type of operation when you are in a state of OK or
>> > >> >> OK_NEW_SCHEMA. [ 92fc8807-220b-466c-bbac-1f524d4251cb on
>> > >> >> ip-10-8-1-154.ap-southeast-2.compute.internal:31010 ]
>> > >> >> [ 92fc8807-220b-466c-bbac-1f524d4251cb on
>> > >> >> ip-10-8-1-154.ap-southeast-2.compute.internal:31010 ]
>> > >> >>
>> > >> >>
>> > >> >> Error: exception while executing query: Failure while executing
>> > query.
>> > >> >> (state=,code=0)
>> > >> >>
>> > >> >> Seems like a very simple query.
>> > >> >>
>> > >> >> Funnily enough, if I copy it off HDFS to the local system and run
>> the
>> > >> same
>> > >> >> query against the local file, it works fine.  Just purely
>> something
>> > to
>> > >> do
>> > >> >> with HDFS.
>> > >> >>
>> > >> >> Any ideas?  I'm running 0.7.
>> > >> >>
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: Can't query parquet on HDFS

Reply via email to