I added a JIRA related to this: https://issues.apache.org/jira/browse/DRILL-4699
On Sun, May 29, 2016 at 6:55 AM, John Omernik <j...@omernik.com> wrote: > Hey all, when looking at the drill options, and specifically as I was > trying to understand the parquet options, I realized that the naming of the > options was forming "question" as I looked at them. What do I mean? > Consider: > > +--------------------------------------------+ > > | name | > > +--------------------------------------------+ > > | store.parquet.block-size | > > | store.parquet.compression | > > | store.parquet.dictionary.page-size | > > | store.parquet.enable_dictionary_encoding | > > | store.parquet.page-size | > > | store.parquet.use_new_reader | > > | store.parquet.vector_fill_check_threshold | > > | store.parquet.vector_fill_threshold | > > +--------------------------------------------+ > > > > So I will remove "store.parquet" as I refer to them here: > > > use_new_reader - This seems fairly obvious an "on read" options and > (maybe?) does affect the Parquet writer, yet "enable_dictionary_encoding" > is likely ONLY an on write option.... correct? I mean, if the Parquet file > was written somewhere else, and written with Dictionary encoding, Drill > will still read it ok, regardless of this setting. Compression as well, if > the Parquet file was created with gzip, and this setting is snappy, it will > still read it, same goes for block size. Thus, those seem to be "writer" > settings, rather than reader settings. > > > So what about the vector settings? Write or Read (or both?) For json there > is this setting: | store.json.writer.uglify which seems to be writer > focused and obviously writer, but for other settings, knowing what the > setting applies to, on write, on read, neither, or both, could be very > useful for troubleshooting and knowing which settings to play with. > > > Now, changing these settings as they are is not recommended, even in my > test clusters, I have scripts that alter them for specific ETLs, and I > would hate to have things break, but how hard would it be to add a string > column to sys.options something like "applies_to" with write, read, both, > neither, n/a as options? I think this could be valuable for users and > administrators of Drill. > > > One other note, in addition to the applies_to, would it be horrifically > difficult to add a "description" field for options? Self documenting > settings sure would be handy.... :) > > > John > > >