Re: Parquet Files Loaded to "directory" partitions Error on Grouping by dir0

John Omernik Thu, 05 Nov 2015 08:17:31 -0800

I am getting the same error as before.  I am using the same query to create
my tables, so I am unsure why it would see the schema as different. Is
there way to view the schema for each directory to see what Drill thinks is
happening?


On Thu, Nov 5, 2015 at 9:35 AM, Andries Engelbrecht <
[email protected]> wrote:

> Have you tried to create a view on top of parqtable, and then use the view?
>
> I did a quick experiment that may help you.
>
> Created a dir structure as follows (using csv files - what I had on hand)
> orders
> |____month1
> |____month2
> .
> .
> |____month9
>
> create or replace view dfs.views.ordersdir as select dir0, * from
> dfs.data.`/orders`;
>
> 0: jdbc:drill:> select * from dfs.views.ordersdir limit 2;
>
> +---------+-------------------------------------------------------------------+---------+
> |  dir0   |                              columns
>     |  dir00  |
>
> +---------+-------------------------------------------------------------------+---------+
> | month1  | ["8000","January","2014-01-02
> 23:06:07","15349","ri","203","26"]  | month1  |
> | month1  | ["8001","January","2014-01-31
> 03:40:21","16033","oh","113","23"]  | month1  |
>
> +---------+-------------------------------------------------------------------+————+
>
> (interesting to note dir00 in the output)
>
> But the group by on the dir works as it is a column defined in the view
>
> 0: jdbc:drill:> select dir0, count(*) from dfs.views.ordersdir group by
> dir0;
> +---------+---------+
> |  dir0   | EXPR$1  |
> +---------+---------+
> | month1  | 9000    |
> | month6  | 16000   |
> | month9  | 15131   |
> | month2  | 12000   |
> | month3  | 16000   |
> | month7  | 15573   |
> | month5  | 16000   |
> | month4  | 12000   |
> | month8  | 15390   |
> +---------+————+
>
> See if that works in your case.
>
>
> —Andries
>
>
> > On Nov 5, 2015, at 6:48 AM, John Omernik <[email protected]> wrote:
> >
> > Hey all,  to facilitate loading of some data from JSON to Parquet, I am
> > using the the load into "day" based directories...
> >
> > parqtable
> > |
> > |_______2015-11-01
> > |
> > |_______2015-11-02
> > |
> > |_______2015-11-03
> > |
> > |_______2015-11-04
> >
> > That way I can do select * from `parqtable` where dir0 = '2015-11-01' and
> > other cool tricks. It also helps my data loading.
> >
> > I am using the exact same query to load each day.
> >
> > CREATE TABLE `parqtable/2015-11-01' as
> > (select field1, field2, field3, field4 from jsontable where dir0 =
> > '2015-11-01')
> >
> > CREATE TABLE `parqtable/2015-11-02' as
> > (select field1, field2, field3, field4 from jsontable where dir0 =
> > '2015-11-02')
> >
> > CREATE TABLE `parqtable/2015-11-03' as
> > (select field1, field2, field3, field4 from jsontable where dir0 =
> > '2015-11-03')
> >
> > Etc
> >
> > This seams to work well except for one thing:
> >
> > If I want to see the count per directory, this (what I thought was
> obvious)
> > query:
> >
> > select dir0, count(*) from `parqtable` group by dir0
> >
> > fails with
> >
> > Error: UNSUPPORTED_OPERATION_ ERROR: Hash aggregate does not support
> schema
> > changes
> >
> > Fragment: 2:8
> >
> >
> > I am not sure why this would be the case, the data is loaded by the same
> > query, I would assume the schema is the same....
> >
> > Thoughts on how to troubleshoot?
> >
> > Thanks!
> >
> > John
>
>

Re: Parquet Files Loaded to "directory" partitions Error on Grouping by dir0

Reply via email to