Thanks. Yes. I'm going to try the renaming approach. Not a rant but isn't the whole point of a "storage-plugins-override.conf" to override storage plugin configuration?
Btw: I'm in embedded mode. So I guess I can also use the config files from /tmp/drill after "fixing" the format configuration in the ui and use them e.g. in a docker image. regards Stefan On Wed, Jul 12, 2023 at 6:04 PM Charles Givre <[email protected]> wrote: > My sense of what is happening in your use case is that the configs that > exist in the UI are overriding the conf file. What it seems like you want > is the opposite order of precedence. I've never used the conf files for > this, so I don't have a lot of experience with that, but it would seem that > the best way to get your Drill cluster configured to so what you want is to > delete or disable the configs in the UI and only use the ones in the config > file. > > By conflicting I meant that let's say that you have a plugin called dfs > that has the json format enabled. If you put a configuration for a plugin > also called dfs in the conf file, what I think is happening is that since > you have two plugins with the same name, Drill will read the ones from the > UI. (FYSA, they aren't actually stored in the UI. If you are using Drill > in distributed mode, those configurations are stored in zookeeper. If you > are in embedded mode, they are stored on your drive somewhere.) > > Anyway, IMHO, the best thing to do would be to make sure that the plugins > in your conf file do not have the same names as the pluigns that appear in > the UI. That's what I was getting at. Does that make sense? > Best, > -- C > > > On Jul 12, 2023, at 11:57 AM, Stefan Ziegler < > [email protected]> wrote: > > > > Hi Charles > > > > not sure if I understand you correctly: what do you mean with “not > conflicting”. My attempt is to not use the UI at all to configure storages. > I thought this can be achieved by overriding the defaut storages with the > “override” file. This seems to work except the strange behaviour with the > formats. > > > > regards > > Stefan > > > > Sent from Outlook for iOS<https://aka.ms/o0ukef> > > ________________________________ > > From: Charles Givre <[email protected]> > > Sent: Wednesday, July 12, 2023 5:04 PM > > To: user <[email protected]> > > Subject: Re: Respecting formats restriction when using > storage-plugins-override.conf > > > > Hi Stefan, > > My biggest piece of advice here would just be to make sure the plugins > specified in the override file do not conflict with the UI-based configs. > It may make sense to have completely different configs in each location. > IE: > > > > dfs-conf and (plain) dfs. > > > > I think that should solve all issues. In theory if you remove a config > from the "formats" section, Drill should not be able to parse the file in > question. So for example if you don't have the 'csv' format or 'excel' > then Drill will not be able to parse those formats. > > > > Best, > > -- C > > > > > >> On Jul 11, 2023, at 2:42 AM, Stefan Ziegler < > [email protected]> wrote: > >> > >> The config für dfs in the UI looks like this: > >> > >> { > >> "type": "file", > >> "connection": "file:///", > >> "workspaces": { > >> "tmp": { > >> "location": "/tmp", > >> "writable": true, > >> "defaultInputFormat": null, > >> "allowAccessOutsideWorkspace": false > >> }, > >> "root": { > >> "location": "/", > >> "writable": false, > >> "defaultInputFormat": null, > >> "allowAccessOutsideWorkspace": false > >> }, > >> "home": { > >> "location": "/Users/stefan", > >> "writable": true, > >> "defaultInputFormat": null, > >> "allowAccessOutsideWorkspace": false > >> } > >> }, > >> "formats": { > >> "parquet": { > >> "type": "parquet" > >> }, > >> "json": { > >> "type": "json", > >> "extensions": [ > >> "json" > >> ] > >> }, > >> "excel": { > >> "type": "excel", > >> "extensions": [ > >> "xlsx" > >> ], > >> "lastRow": 1048576, > >> "ignoreErrors": true, > >> "maxArraySize": -1, > >> "thresholdBytesForTempFiles": -1 > >> }, > >> "spss": { > >> "type": "spss", > >> "extensions": [ > >> "sav" > >> ] > >> }, > >> "iceberg": { > >> "type": "iceberg", > >> "properties": null, > >> "caseSensitive": null, > >> "includeColumnStats": null, > >> "ignoreResiduals": null, > >> "snapshotId": null, > >> "snapshotAsOfTime": null, > >> "fromSnapshotId": null, > >> "toSnapshotId": null > >> }, > >> "httpd": { > >> "type": "httpd", > >> "extensions": [ > >> "httpd" > >> ], > >> "logFormat": "common\ncombined" > >> }, > >> "xml": { > >> "type": "xml", > >> "extensions": [ > >> "xml" > >> ], > >> "dataLevel": 1 > >> }, > >> "syslog": { > >> "type": "syslog", > >> "extensions": [ > >> "syslog" > >> ], > >> "maxErrors": 10 > >> }, > >> "msaccess": { > >> "type": "msaccess", > >> "extensions": [ > >> "mdb", > >> "accdb" > >> ] > >> }, > >> "hdf5": { > >> "type": "hdf5", > >> "extensions": [ > >> "h5" > >> ], > >> "defaultPath": null > >> }, > >> "ltsv": { > >> "type": "ltsv", > >> "extensions": [ > >> "ltsv" > >> ], > >> "parseMode": "lenient", > >> "escapeCharacter": null, > >> "kvDelimiter": null, > >> "entryDelimiter": null, > >> "lineEnding": null, > >> "quoteChar": null > >> }, > >> "delta": { > >> "type": "delta", > >> "version": null, > >> "timestamp": null > >> }, > >> "shp": { > >> "type": "shp", > >> "extensions": [ > >> "shp" > >> ] > >> }, > >> "image": { > >> "type": "image", > >> "extensions": [ > >> "jpg", > >> "jpeg", > >> "jpe", > >> "tif", > >> "tiff", > >> "dng", > >> "psd", > >> "png", > >> "bmp", > >> "gif", > >> "ico", > >> "pcx", > >> "wav", > >> "wave", > >> "avi", > >> "webp", > >> "mov", > >> "mp4", > >> "m4a", > >> "m4p", > >> "m4b", > >> "m4r", > >> "m4v", > >> "3gp", > >> "3g2", > >> "eps", > >> "epsf", > >> "epsi", > >> "ai", > >> "arw", > >> "crw", > >> "cr2", > >> "nef", > >> "orf", > >> "raf", > >> "rw2", > >> "rwl", > >> "srw", > >> "x3f" > >> ], > >> "fileSystemMetadata": true, > >> "descriptive": true > >> }, > >> "pdf": { > >> "type": "pdf", > >> "extensions": [ > >> "pdf" > >> ], > >> "extractHeaders": true, > >> "extractionAlgorithm": "basic" > >> }, > >> "sas": { > >> "type": "sas", > >> "extensions": [ > >> "sas7bdat" > >> ] > >> }, > >> "pcap": { > >> "type": "pcap", > >> "extensions": [ > >> "pcap", > >> "pcapng" > >> ] > >> } > >> }, > >> "authMode": "SHARED_USER", > >> "enabled": true > >> } > >> > >> I'm now able to query some XML data: "SELECT * FROM > >> dfs.home.`ch.so.afu.abbaustellen.xml`;" Which I actually don't want to > be > >> able to (see formats in the "storage-plugins-override.conf" file). If I > >> remove the xml format section in the config in the UI, I'm not able to > >> query the xml anymore: "Error: VALIDATION ERROR: From line 1, column 15 > to > >> line 1, column 51: Object 'ch.so.afu.abbaustellen.xml' not found within > >> 'dfs.home'". > >> > >> regards > >> Stefan > >> > >> > >> On Mon, Jul 10, 2023 at 9:15 PM Charles Givre <[email protected]> wrote: > >> > >>> HI Stefan, > >>> What's in the config in the UI? Can you also please clarify what > queries > >>> are running which indicate that your configs aren't working? > >>> Best, > >>> -- C > >>> > >>> > >>> > >>>> On Jul 10, 2023, at 1:11 PM, Stefan Ziegler < > [email protected]> > >>> wrote: > >>>> > >>>> "storage": { > >>>> cp: { > >>>> type: "file", > >>>> connection: "classpath:///", > >>>> formats: { > >>>> "csv" : { > >>>> type: "text", > >>>> extensions: [ "csv" ], > >>>> delimiter: "," > >>>> } > >>>> } > >>>> enabled: true > >>>> } > >>>> } > >>>> "storage": { > >>>> dfs: { > >>>> type: "file", > >>>> connection: "file:///", > >>>> workspaces: { > >>>> "tmp": { > >>>> "location": "/tmp", > >>>> "writable": true, > >>>> "defaultInputFormat": null, > >>>> "allowAccessOutsideWorkspace": false > >>>> }, > >>>> "home": { > >>>> "location": "/Users/stefan", > >>>> "writable": true, > >>>> "defaultInputFormat": null, > >>>> "allowAccessOutsideWorkspace": false > >>>> }, > >>>> "root": { > >>>> "location": "/", > >>>> "writable": false, > >>>> "defaultInputFormat": null, > >>>> "allowAccessOutsideWorkspace": false > >>>> } > >>>> }, > >>>> formats: { > >>>> "parquet": { > >>>> "type": "parquet" > >>>> }, > >>>> "json": { > >>>> "type": "json", > >>>> "extensions": [ > >>>> "json" > >>>> ] > >>>> } > >>>> }, > >>>> enabled: true > >>>> } > >>>> } > >>>> "storage": { > >>>> s3: { > >>>> type: "file", > >>>> connection: "s3a://<my-bucket-name>", > >>>> config: { > >>>> "fs.s3a.aws.credentials.provider": > >>>> "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider", > >>>> "fs.s3a.endpoint": "s3.eu-central-1.amazonaws.com", > >>>> "fs.s3a.impl.disable.cache": "false" > >>>> }, > >>>> workspaces: { > >>>> "root": { > >>>> "location": "/", > >>>> "writable": false, > >>>> "defaultInputFormat": "parquet", > >>>> "allowAccessOutsideWorkspace": false > >>>> } > >>>> }, > >>>> "formats": { > >>>> "parquet": { > >>>> "type": "parquet" > >>>> } > >>>> }, > >>>> enabled: true > >>>> } > >>>> } > >>>> > >>>> > >>>> > >>>> > >>>> On Mon, Jul 10, 2023 at 6:40 PM Charles Givre <[email protected]> > wrote: > >>>> > >>>>> Can you share your configs with any sensitive info redacted? The > lists > >>>>> don't support images, so please just cut/paste the json. > >>>>> I had another idea... > >>>>> -- C > >>>>> > >>>>> > >>>>>> On Jul 10, 2023, at 12:28 PM, Stefan Ziegler < > >>>>> [email protected]> wrote: > >>>>>> > >>>>>> Yes, I think I'm following these instructions. And the file is not > >>>>>> completely ignored. It creates additional format definitions. Let's > >>> say I > >>>>>> white list some formats in my storage configuration and Drill adds > more > >>>>>> formats (which I don't want). Is there another way to start a > "vanilla" > >>>>>> Drill installation with my own configurations? > >>>>>> > >>>>>> Stefan > >>>>>> > >>>>>> On Mon, Jul 10, 2023 at 6:17 PM Charles Givre <[email protected]> > >>> wrote: > >>>>>> > >>>>>>> Hi Stefan, > >>>>>>> My apologies.. Ok.. so the issue is that the > >>>>> storage-plugins-override.conf > >>>>>>> is being ignored. I've never actually used this feature, so I > wasn't > >>>>>>> familiar with it, but are you folllowing the instructions here [1] > >>> with > >>>>>>> respect to configuration and restarting Drill? My suggestion > would be > >>>>> to > >>>>>>> remove all the plugins in the UI and only specify them in the .conf > >>>>> file. > >>>>>>> Drill has an order of precedence and I suspect what is happening is > >>> that > >>>>>>> the UI versions have a higher priority than the .conf versions. > Does > >>>>> that > >>>>>>> make sense? > >>>>>>> > >>>>>>> -- C > >>>>>>> > >>>>>>> [1]: > >>>>>>> > >>>>> > >>> > https://drill.apache.org/docs/configuring-storage-plugins/#configuring-storage-plugins-with-the-storage-plugins-overrideconf-file > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> On Jul 10, 2023, at 12:06 PM, Stefan Ziegler < > >>>>>>> [email protected]> wrote: > >>>>>>>> > >>>>>>>> Hi Charles > >>>>>>>> > >>>>>>>> I use a "storage-plugins-override.conf" file. My attempt is to > have > >>> the > >>>>>>>> configuration for my storages in a single file and Drill can pick > up > >>>>> the > >>>>>>>> configuration on startup. I put "storage-plugins-override.conf" in > >>> the > >>>>>>> conf > >>>>>>>> directory and Drill creates the storages on startup but (and that > is > >>> my > >>>>>>>> problem) also creates all formats for every storage defined in my > >>>>> config > >>>>>>>> file. E.g. I have a (local) file type storage and I define two > >>> formats > >>>>>>>> (parquet and json) in it. Drill does not respect my restriction to > >>> two > >>>>>>>> formats in the config file but creates all formats known to Drill > >>> (like > >>>>>>>> iceberg, xml etc.). > >>>>>>>> > >>>>>>>> regards > >>>>>>>> Stefan > >>>>>>>> > >>>>>>>> On Mon, Jul 10, 2023 at 5:30 PM Charles Givre <[email protected]> > >>>>> wrote: > >>>>>>>> > >>>>>>>>> HI Stefan, > >>>>>>>>> Thanks for your interest in Drill. You have to define the format > >>>>> config > >>>>>>>>> for each storage plugin. Otherwise Drill doesn't know what > >>> extension > >>>>> to > >>>>>>>>> associate with what format plugin. Out of curiosity, why are you > >>>>> using > >>>>>>> the > >>>>>>>>> .conf files for this? > >>>>>>>>> -- C > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> On Jul 9, 2023, at 12:03 PM, Stefan Ziegler < > >>>>>>> [email protected]> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Not defining a format seems to prevent the user from querying > the > >>>>>>>>> specific > >>>>>>>>>> format. E.g. after deleting the xml format definition in the web > >>> gui, > >>>>>>> I'm > >>>>>>>>>> not able to query xml files anymore. So I guess my assumption > was > >>>>>>> right. > >>>>>>>>>> > >>>>>>>>>> Stefan > >>>>>>>>>> > >>>>>>>>>> On Sun, Jul 9, 2023 at 5:41 PM Stefan Ziegler < > >>>>>>>>> [email protected]> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Btw: I assumed that the list of formats act as a restriction. > >>>>> Probably > >>>>>>>>> I'm > >>>>>>>>>>> wrong. > >>>>>>>>>>> > >>>>>>>>>>> Stefan > >>>>>>>>>>> > >>>>>>>>>>> On Sun, Jul 9, 2023 at 5:27 PM Stefan Ziegler < > >>>>>>>>> [email protected]> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi > >>>>>>>>>>>> > >>>>>>>>>>>> I'm using storage-plugins-override.conf to configure the > storage > >>>>>>>>> plugins > >>>>>>>>>>>> on startup. My storage configurations contain only one or two > >>>>> formats > >>>>>>>>>>>> (parquet, json, csv). Checking the storages in the web gui I > >>>>> noticed > >>>>>>>>> that > >>>>>>>>>>>> for all the storages all formats are enabled, e.g. msaccess, > >>>>> iceberg > >>>>>>>>> etc. > >>>>>>>>>>>> > >>>>>>>>>>>> Is this on purpose or did I do something wrong? > >>>>>>>>>>>> > >>>>>>>>>>>> Example configuration: > >>>>>>>>>>>> > >>>>>>>>>>>> "storage": { > >>>>>>>>>>>> dfs: { > >>>>>>>>>>>> type: "file", > >>>>>>>>>>>> connection: "file:///", > >>>>>>>>>>>> workspaces: { > >>>>>>>>>>>> "tmp": { > >>>>>>>>>>>> "location": "/tmp", > >>>>>>>>>>>> "writable": true, > >>>>>>>>>>>> "defaultInputFormat": null, > >>>>>>>>>>>> "allowAccessOutsideWorkspace": false > >>>>>>>>>>>> }, > >>>>>>>>>>>> "root": { > >>>>>>>>>>>> "location": "/", > >>>>>>>>>>>> "writable": false, > >>>>>>>>>>>> "defaultInputFormat": null, > >>>>>>>>>>>> "allowAccessOutsideWorkspace": false > >>>>>>>>>>>> } > >>>>>>>>>>>> }, > >>>>>>>>>>>> formats: { > >>>>>>>>>>>> "parquet": { > >>>>>>>>>>>> "type": "parquet" > >>>>>>>>>>>> }, > >>>>>>>>>>>> "json": { > >>>>>>>>>>>> "type": "json", > >>>>>>>>>>>> "extensions": [ > >>>>>>>>>>>> "json" > >>>>>>>>>>>> ] > >>>>>>>>>>>> } > >>>>>>>>>>>> }, > >>>>>>>>>>>> enabled: true > >>>>>>>>>>>> } > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> regards > >>>>>>>>>>>> Stefan > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>> > >>> > > > >
