I added a JIRA related to this:

https://issues.apache.org/jira/browse/DRILL-4699

On Sun, May 29, 2016 at 6:55 AM, John Omernik <j...@omernik.com> wrote:

> Hey all, when looking at the drill options, and specifically as I was
> trying to understand the parquet options, I realized that the naming of the
> options was forming "question" as I looked at them. What do I mean?
> Consider:
>
> +--------------------------------------------+
>
> |                    name                    |
>
> +--------------------------------------------+
>
> | store.parquet.block-size                   |
>
> | store.parquet.compression                  |
>
> | store.parquet.dictionary.page-size         |
>
> | store.parquet.enable_dictionary_encoding   |
>
> | store.parquet.page-size                    |
>
> | store.parquet.use_new_reader               |
>
> | store.parquet.vector_fill_check_threshold  |
>
> | store.parquet.vector_fill_threshold        |
>
> +--------------------------------------------+
>
>
>
> So I will remove "store.parquet" as I refer to them here:
>
>
> use_new_reader - This seems fairly obvious an "on read" options and
> (maybe?) does affect the Parquet writer, yet "enable_dictionary_encoding"
> is likely ONLY an on write option.... correct? I mean, if the Parquet file
> was written somewhere else, and written with Dictionary encoding, Drill
> will still read it ok, regardless of this setting. Compression as well, if
> the Parquet file was created with gzip, and this setting is snappy, it will
> still read it, same goes for block size. Thus, those seem to be "writer"
> settings, rather than reader settings.
>
>
> So what about the vector settings? Write or Read (or both?) For json there
> is this setting: | store.json.writer.uglify    which seems to be writer
> focused and obviously writer, but for other settings, knowing what the
> setting applies to, on write, on read, neither, or both, could be very
> useful for troubleshooting and knowing which settings to play with.
>
>
> Now, changing these settings as they are is not recommended, even in my
> test clusters, I have scripts that alter them for specific ETLs, and I
> would hate to have things break, but how hard would it be to add a string
> column to sys.options something like "applies_to" with write, read, both,
> neither, n/a as options?   I think this could be valuable for users and
> administrators of Drill.
>
>
> One other note, in addition to the applies_to,  would it be horrifically
> difficult to add a  "description" field for options?  Self documenting
> settings sure would be handy....  :)
>
>
> John
>
>
>

Reply via email to