So we are off to a flying start :)

On Thu, Oct 29, 2015 at 9:50 PM, Stefán Baxter <ste...@activitystream.com>
wrote:

> Hi,
>
> We are using Avro, JSON and Parquet for collection various types of data
> for analytical processing.
>
> I have not used Parquet before we starting to play around with Drill and
> now I'm wondering if we are planing our data structures correctly and if we
> will be able to get the most out of Drill+Parquet.
>
> I have some questions and I hope the answers can be turned into a Best
> Practices document.
>
> So here we go:
>
>    - Are there any rules that we must abide by to make scanning of
>    "low-cardinality" columns as effective as they can be?
>    - I understand it so that the Parquet dictionary is scanned for the
>    value(s) and if they are not in the dictionary that the section is ignored
>
>    - Can dictionary based scanning (as described above) work on arrays?
>    - like: {"some":"simple","tags":["blue","green","yellow"]}
>
>    - If I have multiple files containing a days worth of logging, in
>    chronological order, will all the irrelevant files be ignored when looking
>    for a data or a date range?
>    - AKA - Will the min-max headers in Parquet be used to prevent
>    scanning of data outside the range?
>
>    - Is there anything I need to do to make sure that the write
>    optimizations in Parquet are used?
>    - dictionaries for low cardinality fields
>    - "number folding" for numerical sequences
>    - compression etc.
>
>    - Are there any Parquet features that are not available in Parquet?
>    - I know Drill is using a fork of Parquet and I wonder if any major
>    improvements in parquet are unavailable
>
>    - Storing Dates with timezone information (stored in two separate
>    fields?)
>    - What is the common approach?
>
>    - Are there any caveats in converting Avro to Parquet?
>    - other than to convert unix dates from Avor (only long
>    available) into timsetamp fields in Parquet
>
>
> There will, in all likelihood, be future installment to this entry as new
> questions arise.
>
> All help is appreciated.
>
> Regards,
>  -Stefan
>
>
>

Reply via email to