I have seen some discussions on the Parquet storage format suggesting
that sorting time series data on the time key prior to converting to the
Parquet format will improve range query efficiency via min/max values on
column chunks - perhaps analogous to skip indexes?
Is this a recommended
On 6/1/15, 12:14 PM, Matt bsg...@gmail.com wrote:
Segmenting data into directories in HDFS would require clients to
structure queries accordingly, but would there be benefit in reduced
query time by limiting scan ranges?
Yes. I am just a newbie user, but I have already seen that work with