Re: Time series storage with parquet

2017-11-01 Thread Ted Dunning
Rahul Ctas plus some file moves are what you need. Do a query against the new file to force the meta data cache to be updated. Also, consider not building the weekly files. You might measure their impact but I would expect no gain and possibly some loss of performance due to less parallelism. In

Re: Time series storage with parquet

2017-11-01 Thread Rahul Raj
Hi Padma, I could see a merge command with parquet-tools. I have not tested it myself, but wanted to check if it would work along with apache drill. https://issues.apache.org/jira/browse/PARQUET-460 https://github.com/apache/parquet-mr/blob/master/parquet-tools/src/main/java/org/apache/parquet/t

Re: Time series storage with parquet

2017-10-31 Thread Padma Penumarthy
parquet-tools can be used only for inspecting parquet files, not for creating new parquet files. Yes, you can use CTAS to do this. You have to manually remove the old files and move the new files. It does impact the metadata caching mechanism. You need to regenerate metadata cache. Thanks Pad

Time series storage with parquet

2017-10-31 Thread Rahul Raj
Hi, I have few questions on modeling a time series use case with parquet and drill. I have seen the topic discussed at https://issues.apache.org/jira/browse/DRILL-3534. My requirements are: * Keep the parquet files partitioned by year and month * For the current month, the data needs to be furth