Hi,

I have few questions on modeling a time series use case with parquet and
drill. I have seen the topic discussed at
https://issues.apache.org/jira/browse/DRILL-3534.

My requirements are:

* Keep the parquet files partitioned by year and month
* For the current month, the data needs to be further partitioned by Week
and Day
* End of the running week, 7 daily parquets will be merged to a single
weekly file
* Similarly, weekly files will to be merged to form a monthly file during
month end

I will have a web application to generate the daily data and to ensure the
batch runs/ atomic writes/locking etc.

What are the possible ways to merge parquet files? Another CTAS?

Is it possible to use parquet-tools(part of Parquet-MR) to merge multiple
parquets(java jar ./parquet-tools-<VERSION>.jar <command> <input-directory>
<output-file>) and then let drill query the results?. Will it impact the
drill meta data caching mechanism?

Regards,
Rahul

-- 
**** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.****

Reply via email to