Re: Can't query parquet on HDFS

2015-01-07 Thread Adam Gilmore
Hi Jacques, I debugged and found the issue. I have a JIRA issue going at: https://issues.apache.org/jira/browse/DRILL-1948 If someone could point me in the right direction of that Parquet fork, I can probably submit a patch for this and resolve it. On Wed, Jan 7, 2015 at 3:56 PM, Jacques Nadea

Re: Can't query parquet on HDFS

2015-01-07 Thread Adam Gilmore
P.S. For a more extreme example (1M rows) that returns: Query failed: Query failed: Failure while running fragment., You tried to do a batch data read operation when you were in a state of STOP. You can only do this type of operation when you are in a state of OK or OK_NEW_SCHEMA. [ 91b9e166-d185

Re: Can't query parquet on HDFS

2015-01-07 Thread Adam Gilmore
I can definitely put it up somewhere - it's only 72kb (the Parquet file). I'm using Hadoop 2.4.0 running on Amazon EMR. If I get and put it back onto HDFS, it still has the same problem, unfortunately. https://www.dropbox.com/s/nzbg8986mt5t8md/saletest2.tgz?dl=0 I notice in the source that there

Re: Can't query parquet on HDFS

2015-01-07 Thread Adam Gilmore
I can definitely put it up somewhere - it's only 72kb (the Parquet file). I'm using Hadoop 2.4.0 running on Amazon EMR. If I get and put it back onto HDFS, it still has the same problem, unfortunately. https://www.dropbox.com/s/nzbg8986mt5t8md/saletest2.tgz?dl=0 I notice in the source that there

Re: Can't query parquet on HDFS

2015-01-06 Thread Jacques Nadeau
Nothing is immediately coming to mind. Out of curiosity, does it still have this problem if you copy the local file back on HDFS and then query it? What version of HDFS are using? Is the file something you can share privately or publically or is too large? thanks, Jacques On Tue, Jan 6, 2015 a

Re: Can't query parquet on HDFS

2015-01-06 Thread Adam Gilmore
Anyone got any ideas on this one? I can consistently reproduce the issue with HDFS - the minute I get the data off HDFS (to a local drive), it all works fine. Doesn't seem to be a problem with Parquet - more like the HDFS storage engine. On Tue, Jan 6, 2015 at 9:50 AM, Adam Gilmore wrote: > Th

Re: Can't query parquet on HDFS

2015-01-05 Thread Adam Gilmore
The data is okay, because the exact same Parquet directory is working fine on the local drive, it's just not working when using HDFS. I tried casting as you said, but that ended up with the exact same problem. Regards, *Adam Gilmore* Director of Technology a...@pharmadata.net.au +61 421 99

Re: Can't query parquet on HDFS

2015-01-05 Thread Adam Gilmore
The data is okay, because the exact same Parquet directory is working fine on the local drive, it's just not working when using HDFS. I tried casting as you said, but that ended up with the exact same problem. On Tue, Jan 6, 2015 at 9:49 AM, MapR wrote: > Please try casting the colum data type.

Re: Can't query parquet on HDFS

2015-01-05 Thread MapR
Please try casting the colum data type. Also please verify that all the column data is satisfying your data type. Sudhakar Thota Sent from my iPhone > On Jan 5, 2015, at 5:56 AM, Adam Gilmore wrote: > > The actual stack trace is: > > 2015-01-05 13:48:27,356 [2b5569d5-3771-748d-1390-3a8930d020

Re: Can't query parquet on HDFS

2015-01-05 Thread Adam Gilmore
The actual stack trace is: 2015-01-05 13:48:27,356 [2b5569d5-3771-748d-1390-3a8930d02002:frag:1:12] ERROR o.a.drill.exec.ops.FragmentContext - Fragment Context received failure. org.apache.drill.common.exceptions.DrillRuntimeException: java.io.IOException: can not read class parquet.format.PageHea

Can't query parquet on HDFS

2015-01-05 Thread Adam Gilmore
Hi all, I'm trying to do a really simple query on a parquet directory on HDFS. This works fine: select count(*) from hdfs.warehouse.saleparquet However, this fails: 0: jdbc:drill:local> select sum(sellprice) from hdfs.warehouse.saleparquet; Query failed: Query failed: Failure while running fra