subject:"Sorting and partitioning for range scans\?"

Sorting and partitioning for range scans?

2015-06-01 Thread Matt

I have seen some discussions on the Parquet storage format suggesting that sorting time series data on the time key prior to converting to the Parquet format will improve range query efficiency via min/max values on column chunks - perhaps analogous to skip indexes? Is this a recommended

Re: Sorting and partitioning for range scans?

2015-06-01 Thread Paul Mogren

On 6/1/15, 12:14 PM, Matt bsg...@gmail.com wrote: Segmenting data into directories in HDFS would require clients to structure queries accordingly, but would there be benefit in reduced query time by limiting scan ranges? Yes. I am just a newbie user, but I have already seen that work with