The config für dfs in the UI looks like this: { "type": "file", "connection": "file:///", "workspaces": { "tmp": { "location": "/tmp", "writable": true, "defaultInputFormat": null, "allowAccessOutsideWorkspace": false }, "root": { "location": "/", "writable": false, "defaultInputFormat": null, "allowAccessOutsideWorkspace": false }, "home": { "location": "/Users/stefan", "writable": true, "defaultInputFormat": null, "allowAccessOutsideWorkspace": false } }, "formats": { "parquet": { "type": "parquet" }, "json": { "type": "json", "extensions": [ "json" ] }, "excel": { "type": "excel", "extensions": [ "xlsx" ], "lastRow": 1048576, "ignoreErrors": true, "maxArraySize": -1, "thresholdBytesForTempFiles": -1 }, "spss": { "type": "spss", "extensions": [ "sav" ] }, "iceberg": { "type": "iceberg", "properties": null, "caseSensitive": null, "includeColumnStats": null, "ignoreResiduals": null, "snapshotId": null, "snapshotAsOfTime": null, "fromSnapshotId": null, "toSnapshotId": null }, "httpd": { "type": "httpd", "extensions": [ "httpd" ], "logFormat": "common\ncombined" }, "xml": { "type": "xml", "extensions": [ "xml" ], "dataLevel": 1 }, "syslog": { "type": "syslog", "extensions": [ "syslog" ], "maxErrors": 10 }, "msaccess": { "type": "msaccess", "extensions": [ "mdb", "accdb" ] }, "hdf5": { "type": "hdf5", "extensions": [ "h5" ], "defaultPath": null }, "ltsv": { "type": "ltsv", "extensions": [ "ltsv" ], "parseMode": "lenient", "escapeCharacter": null, "kvDelimiter": null, "entryDelimiter": null, "lineEnding": null, "quoteChar": null }, "delta": { "type": "delta", "version": null, "timestamp": null }, "shp": { "type": "shp", "extensions": [ "shp" ] }, "image": { "type": "image", "extensions": [ "jpg", "jpeg", "jpe", "tif", "tiff", "dng", "psd", "png", "bmp", "gif", "ico", "pcx", "wav", "wave", "avi", "webp", "mov", "mp4", "m4a", "m4p", "m4b", "m4r", "m4v", "3gp", "3g2", "eps", "epsf", "epsi", "ai", "arw", "crw", "cr2", "nef", "orf", "raf", "rw2", "rwl", "srw", "x3f" ], "fileSystemMetadata": true, "descriptive": true }, "pdf": { "type": "pdf", "extensions": [ "pdf" ], "extractHeaders": true, "extractionAlgorithm": "basic" }, "sas": { "type": "sas", "extensions": [ "sas7bdat" ] }, "pcap": { "type": "pcap", "extensions": [ "pcap", "pcapng" ] } }, "authMode": "SHARED_USER", "enabled": true }
I'm now able to query some XML data: "SELECT * FROM dfs.home.`ch.so.afu.abbaustellen.xml`;" Which I actually don't want to be able to (see formats in the "storage-plugins-override.conf" file). If I remove the xml format section in the config in the UI, I'm not able to query the xml anymore: "Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 51: Object 'ch.so.afu.abbaustellen.xml' not found within 'dfs.home'". regards Stefan On Mon, Jul 10, 2023 at 9:15 PM Charles Givre <cgi...@gmail.com> wrote: > HI Stefan, > What's in the config in the UI? Can you also please clarify what queries > are running which indicate that your configs aren't working? > Best, > -- C > > > > > On Jul 10, 2023, at 1:11 PM, Stefan Ziegler <stefan.ziegler...@gmail.com> > wrote: > > > > "storage": { > > cp: { > > type: "file", > > connection: "classpath:///", > > formats: { > > "csv" : { > > type: "text", > > extensions: [ "csv" ], > > delimiter: "," > > } > > } > > enabled: true > > } > > } > > "storage": { > > dfs: { > > type: "file", > > connection: "file:///", > > workspaces: { > > "tmp": { > > "location": "/tmp", > > "writable": true, > > "defaultInputFormat": null, > > "allowAccessOutsideWorkspace": false > > }, > > "home": { > > "location": "/Users/stefan", > > "writable": true, > > "defaultInputFormat": null, > > "allowAccessOutsideWorkspace": false > > }, > > "root": { > > "location": "/", > > "writable": false, > > "defaultInputFormat": null, > > "allowAccessOutsideWorkspace": false > > } > > }, > > formats: { > > "parquet": { > > "type": "parquet" > > }, > > "json": { > > "type": "json", > > "extensions": [ > > "json" > > ] > > } > > }, > > enabled: true > > } > > } > > "storage": { > > s3: { > > type: "file", > > connection: "s3a://<my-bucket-name>", > > config: { > > "fs.s3a.aws.credentials.provider": > > "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider", > > "fs.s3a.endpoint": "s3.eu-central-1.amazonaws.com", > > "fs.s3a.impl.disable.cache": "false" > > }, > > workspaces: { > > "root": { > > "location": "/", > > "writable": false, > > "defaultInputFormat": "parquet", > > "allowAccessOutsideWorkspace": false > > } > > }, > > "formats": { > > "parquet": { > > "type": "parquet" > > } > > }, > > enabled: true > > } > > } > > > > > > > > > > On Mon, Jul 10, 2023 at 6:40 PM Charles Givre <cgi...@gmail.com> wrote: > > > >> Can you share your configs with any sensitive info redacted? The lists > >> don't support images, so please just cut/paste the json. > >> I had another idea... > >> -- C > >> > >> > >>> On Jul 10, 2023, at 12:28 PM, Stefan Ziegler < > >> stefan.ziegler...@gmail.com> wrote: > >>> > >>> Yes, I think I'm following these instructions. And the file is not > >>> completely ignored. It creates additional format definitions. Let's > say I > >>> white list some formats in my storage configuration and Drill adds more > >>> formats (which I don't want). Is there another way to start a "vanilla" > >>> Drill installation with my own configurations? > >>> > >>> Stefan > >>> > >>> On Mon, Jul 10, 2023 at 6:17 PM Charles Givre <cgi...@gmail.com> > wrote: > >>> > >>>> Hi Stefan, > >>>> My apologies.. Ok.. so the issue is that the > >> storage-plugins-override.conf > >>>> is being ignored. I've never actually used this feature, so I wasn't > >>>> familiar with it, but are you folllowing the instructions here [1] > with > >>>> respect to configuration and restarting Drill? My suggestion would be > >> to > >>>> remove all the plugins in the UI and only specify them in the .conf > >> file. > >>>> Drill has an order of precedence and I suspect what is happening is > that > >>>> the UI versions have a higher priority than the .conf versions. Does > >> that > >>>> make sense? > >>>> > >>>> -- C > >>>> > >>>> [1]: > >>>> > >> > https://drill.apache.org/docs/configuring-storage-plugins/#configuring-storage-plugins-with-the-storage-plugins-overrideconf-file > >>>> > >>>> > >>>> > >>>>> On Jul 10, 2023, at 12:06 PM, Stefan Ziegler < > >>>> stefan.ziegler...@gmail.com> wrote: > >>>>> > >>>>> Hi Charles > >>>>> > >>>>> I use a "storage-plugins-override.conf" file. My attempt is to have > the > >>>>> configuration for my storages in a single file and Drill can pick up > >> the > >>>>> configuration on startup. I put "storage-plugins-override.conf" in > the > >>>> conf > >>>>> directory and Drill creates the storages on startup but (and that is > my > >>>>> problem) also creates all formats for every storage defined in my > >> config > >>>>> file. E.g. I have a (local) file type storage and I define two > formats > >>>>> (parquet and json) in it. Drill does not respect my restriction to > two > >>>>> formats in the config file but creates all formats known to Drill > (like > >>>>> iceberg, xml etc.). > >>>>> > >>>>> regards > >>>>> Stefan > >>>>> > >>>>> On Mon, Jul 10, 2023 at 5:30 PM Charles Givre <cgi...@gmail.com> > >> wrote: > >>>>> > >>>>>> HI Stefan, > >>>>>> Thanks for your interest in Drill. You have to define the format > >> config > >>>>>> for each storage plugin. Otherwise Drill doesn't know what > extension > >> to > >>>>>> associate with what format plugin. Out of curiosity, why are you > >> using > >>>> the > >>>>>> .conf files for this? > >>>>>> -- C > >>>>>> > >>>>>> > >>>>>>> On Jul 9, 2023, at 12:03 PM, Stefan Ziegler < > >>>> stefan.ziegler...@gmail.com> > >>>>>> wrote: > >>>>>>> > >>>>>>> Not defining a format seems to prevent the user from querying the > >>>>>> specific > >>>>>>> format. E.g. after deleting the xml format definition in the web > gui, > >>>> I'm > >>>>>>> not able to query xml files anymore. So I guess my assumption was > >>>> right. > >>>>>>> > >>>>>>> Stefan > >>>>>>> > >>>>>>> On Sun, Jul 9, 2023 at 5:41 PM Stefan Ziegler < > >>>>>> stefan.ziegler...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Btw: I assumed that the list of formats act as a restriction. > >> Probably > >>>>>> I'm > >>>>>>>> wrong. > >>>>>>>> > >>>>>>>> Stefan > >>>>>>>> > >>>>>>>> On Sun, Jul 9, 2023 at 5:27 PM Stefan Ziegler < > >>>>>> stefan.ziegler...@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi > >>>>>>>>> > >>>>>>>>> I'm using storage-plugins-override.conf to configure the storage > >>>>>> plugins > >>>>>>>>> on startup. My storage configurations contain only one or two > >> formats > >>>>>>>>> (parquet, json, csv). Checking the storages in the web gui I > >> noticed > >>>>>> that > >>>>>>>>> for all the storages all formats are enabled, e.g. msaccess, > >> iceberg > >>>>>> etc. > >>>>>>>>> > >>>>>>>>> Is this on purpose or did I do something wrong? > >>>>>>>>> > >>>>>>>>> Example configuration: > >>>>>>>>> > >>>>>>>>> "storage": { > >>>>>>>>> dfs: { > >>>>>>>>> type: "file", > >>>>>>>>> connection: "file:///", > >>>>>>>>> workspaces: { > >>>>>>>>> "tmp": { > >>>>>>>>> "location": "/tmp", > >>>>>>>>> "writable": true, > >>>>>>>>> "defaultInputFormat": null, > >>>>>>>>> "allowAccessOutsideWorkspace": false > >>>>>>>>> }, > >>>>>>>>> "root": { > >>>>>>>>> "location": "/", > >>>>>>>>> "writable": false, > >>>>>>>>> "defaultInputFormat": null, > >>>>>>>>> "allowAccessOutsideWorkspace": false > >>>>>>>>> } > >>>>>>>>> }, > >>>>>>>>> formats: { > >>>>>>>>> "parquet": { > >>>>>>>>> "type": "parquet" > >>>>>>>>> }, > >>>>>>>>> "json": { > >>>>>>>>> "type": "json", > >>>>>>>>> "extensions": [ > >>>>>>>>> "json" > >>>>>>>>> ] > >>>>>>>>> } > >>>>>>>>> }, > >>>>>>>>> enabled: true > >>>>>>>>> } > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> regards > >>>>>>>>> Stefan > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > >