Hey all, to facilitate loading of some data from JSON to Parquet, I am using the the load into "day" based directories...
parqtable | |_______2015-11-01 | |_______2015-11-02 | |_______2015-11-03 | |_______2015-11-04 That way I can do select * from `parqtable` where dir0 = '2015-11-01' and other cool tricks. It also helps my data loading. I am using the exact same query to load each day. CREATE TABLE `parqtable/2015-11-01' as (select field1, field2, field3, field4 from jsontable where dir0 = '2015-11-01') CREATE TABLE `parqtable/2015-11-02' as (select field1, field2, field3, field4 from jsontable where dir0 = '2015-11-02') CREATE TABLE `parqtable/2015-11-03' as (select field1, field2, field3, field4 from jsontable where dir0 = '2015-11-03') Etc This seams to work well except for one thing: If I want to see the count per directory, this (what I thought was obvious) query: select dir0, count(*) from `parqtable` group by dir0 fails with Error: UNSUPPORTED_OPERATION_ ERROR: Hash aggregate does not support schema changes Fragment: 2:8 I am not sure why this would be the case, the data is loaded by the same query, I would assume the schema is the same.... Thoughts on how to troubleshoot? Thanks! John
