Re: Parquet Files Loaded to "directory" partitions Error on Grouping by dir0

Andries Engelbrecht Thu, 05 Nov 2015 07:36:22 -0800

Have you tried to create a view on top of parqtable, and then use the view?


I did a quick experiment that may help you.

Created a dir structure as follows (using csv files - what I had on hand)
orders
|____month1
|____month2
.
.
|____month9

create or replace view dfs.views.ordersdir as select dir0, * from 
dfs.data.`/orders`;

0: jdbc:drill:> select * from dfs.views.ordersdir limit 2;
+---------+-------------------------------------------------------------------+---------+
|  dir0   |                              columns                              | 
 dir00  |
+---------+-------------------------------------------------------------------+---------+
| month1  | ["8000","January","2014-01-02 23:06:07","15349","ri","203","26"]  | 
month1  |
| month1  | ["8001","January","2014-01-31 03:40:21","16033","oh","113","23"]  | 
month1  |
+---------+-------------------------------------------------------------------+————+

(interesting to note dir00 in the output)

But the group by on the dir works as it is a column defined in the view

0: jdbc:drill:> select dir0, count(*) from dfs.views.ordersdir group by dir0;
+---------+---------+
|  dir0   | EXPR$1  |
+---------+---------+
| month1  | 9000    |
| month6  | 16000   |
| month9  | 15131   |
| month2  | 12000   |
| month3  | 16000   |
| month7  | 15573   |
| month5  | 16000   |
| month4  | 12000   |
| month8  | 15390   |
+---------+————+

See if that works in your case.


—Andries


> On Nov 5, 2015, at 6:48 AM, John Omernik <[email protected]> wrote:
> 
> Hey all,  to facilitate loading of some data from JSON to Parquet, I am
> using the the load into "day" based directories...
> 
> parqtable
> |
> |_______2015-11-01
> |
> |_______2015-11-02
> |
> |_______2015-11-03
> |
> |_______2015-11-04
> 
> That way I can do select * from `parqtable` where dir0 = '2015-11-01' and
> other cool tricks. It also helps my data loading.
> 
> I am using the exact same query to load each day.
> 
> CREATE TABLE `parqtable/2015-11-01' as
> (select field1, field2, field3, field4 from jsontable where dir0 =
> '2015-11-01')
> 
> CREATE TABLE `parqtable/2015-11-02' as
> (select field1, field2, field3, field4 from jsontable where dir0 =
> '2015-11-02')
> 
> CREATE TABLE `parqtable/2015-11-03' as
> (select field1, field2, field3, field4 from jsontable where dir0 =
> '2015-11-03')
> 
> Etc
> 
> This seams to work well except for one thing:
> 
> If I want to see the count per directory, this (what I thought was obvious)
> query:
> 
> select dir0, count(*) from `parqtable` group by dir0
> 
> fails with
> 
> Error: UNSUPPORTED_OPERATION_ ERROR: Hash aggregate does not support schema
> changes
> 
> Fragment: 2:8
> 
> 
> I am not sure why this would be the case, the data is loaded by the same
> query, I would assume the schema is the same....
> 
> Thoughts on how to troubleshoot?
> 
> Thanks!
> 
> John

Re: Parquet Files Loaded to "directory" partitions Error on Grouping by dir0

Reply via email to