Hi Stefan, 
My biggest piece of advice here would just be to make sure the plugins 
specified in the override file do not conflict with the UI-based configs.   It 
may make sense to have completely different configs in each location. IE:

dfs-conf and (plain) dfs.  

I think that should solve all issues.  In theory if you remove a config from 
the "formats" section, Drill should not be able to parse the file in question.  
So for example if you don't have the 'csv' format or 'excel' then Drill will 
not be able to parse those formats.

Best,
-- C


> On Jul 11, 2023, at 2:42 AM, Stefan Ziegler <stefan.ziegler...@gmail.com> 
> wrote:
> 
> The config für dfs in the UI looks like this:
> 
> {
>  "type": "file",
>  "connection": "file:///",
>  "workspaces": {
>    "tmp": {
>      "location": "/tmp",
>      "writable": true,
>      "defaultInputFormat": null,
>      "allowAccessOutsideWorkspace": false
>    },
>    "root": {
>      "location": "/",
>      "writable": false,
>      "defaultInputFormat": null,
>      "allowAccessOutsideWorkspace": false
>    },
>    "home": {
>      "location": "/Users/stefan",
>      "writable": true,
>      "defaultInputFormat": null,
>      "allowAccessOutsideWorkspace": false
>    }
>  },
>  "formats": {
>    "parquet": {
>      "type": "parquet"
>    },
>    "json": {
>      "type": "json",
>      "extensions": [
>        "json"
>      ]
>    },
>    "excel": {
>      "type": "excel",
>      "extensions": [
>        "xlsx"
>      ],
>      "lastRow": 1048576,
>      "ignoreErrors": true,
>      "maxArraySize": -1,
>      "thresholdBytesForTempFiles": -1
>    },
>    "spss": {
>      "type": "spss",
>      "extensions": [
>        "sav"
>      ]
>    },
>    "iceberg": {
>      "type": "iceberg",
>      "properties": null,
>      "caseSensitive": null,
>      "includeColumnStats": null,
>      "ignoreResiduals": null,
>      "snapshotId": null,
>      "snapshotAsOfTime": null,
>      "fromSnapshotId": null,
>      "toSnapshotId": null
>    },
>    "httpd": {
>      "type": "httpd",
>      "extensions": [
>        "httpd"
>      ],
>      "logFormat": "common\ncombined"
>    },
>    "xml": {
>      "type": "xml",
>      "extensions": [
>        "xml"
>      ],
>      "dataLevel": 1
>    },
>    "syslog": {
>      "type": "syslog",
>      "extensions": [
>        "syslog"
>      ],
>      "maxErrors": 10
>    },
>    "msaccess": {
>      "type": "msaccess",
>      "extensions": [
>        "mdb",
>        "accdb"
>      ]
>    },
>    "hdf5": {
>      "type": "hdf5",
>      "extensions": [
>        "h5"
>      ],
>      "defaultPath": null
>    },
>    "ltsv": {
>      "type": "ltsv",
>      "extensions": [
>        "ltsv"
>      ],
>      "parseMode": "lenient",
>      "escapeCharacter": null,
>      "kvDelimiter": null,
>      "entryDelimiter": null,
>      "lineEnding": null,
>      "quoteChar": null
>    },
>    "delta": {
>      "type": "delta",
>      "version": null,
>      "timestamp": null
>    },
>    "shp": {
>      "type": "shp",
>      "extensions": [
>        "shp"
>      ]
>    },
>    "image": {
>      "type": "image",
>      "extensions": [
>        "jpg",
>        "jpeg",
>        "jpe",
>        "tif",
>        "tiff",
>        "dng",
>        "psd",
>        "png",
>        "bmp",
>        "gif",
>        "ico",
>        "pcx",
>        "wav",
>        "wave",
>        "avi",
>        "webp",
>        "mov",
>        "mp4",
>        "m4a",
>        "m4p",
>        "m4b",
>        "m4r",
>        "m4v",
>        "3gp",
>        "3g2",
>        "eps",
>        "epsf",
>        "epsi",
>        "ai",
>        "arw",
>        "crw",
>        "cr2",
>        "nef",
>        "orf",
>        "raf",
>        "rw2",
>        "rwl",
>        "srw",
>        "x3f"
>      ],
>      "fileSystemMetadata": true,
>      "descriptive": true
>    },
>    "pdf": {
>      "type": "pdf",
>      "extensions": [
>        "pdf"
>      ],
>      "extractHeaders": true,
>      "extractionAlgorithm": "basic"
>    },
>    "sas": {
>      "type": "sas",
>      "extensions": [
>        "sas7bdat"
>      ]
>    },
>    "pcap": {
>      "type": "pcap",
>      "extensions": [
>        "pcap",
>        "pcapng"
>      ]
>    }
>  },
>  "authMode": "SHARED_USER",
>  "enabled": true
> }
> 
> I'm now able to query some XML data: "SELECT * FROM
> dfs.home.`ch.so.afu.abbaustellen.xml`;" Which I actually don't want to be
> able to (see formats in the "storage-plugins-override.conf" file). If I
> remove the xml format section in the config in the UI, I'm not able to
> query the xml anymore: "Error: VALIDATION ERROR: From line 1, column 15 to
> line 1, column 51: Object 'ch.so.afu.abbaustellen.xml' not found within
> 'dfs.home'".
> 
> regards
> Stefan
> 
> 
> On Mon, Jul 10, 2023 at 9:15 PM Charles Givre <cgi...@gmail.com> wrote:
> 
>> HI Stefan,
>> What's in the config in the UI?  Can you also please clarify what queries
>> are running which indicate that your configs aren't working?
>> Best,
>> -- C
>> 
>> 
>> 
>>> On Jul 10, 2023, at 1:11 PM, Stefan Ziegler <stefan.ziegler...@gmail.com>
>> wrote:
>>> 
>>> "storage": {
>>> cp: {
>>>   type: "file",
>>>   connection: "classpath:///",
>>>   formats: {
>>>     "csv" : {
>>>       type: "text",
>>>       extensions: [ "csv" ],
>>>       delimiter: ","
>>>     }
>>>   }
>>>   enabled: true
>>> }
>>> }
>>> "storage": {
>>> dfs: {
>>>   type: "file",
>>>   connection: "file:///",
>>>   workspaces: {
>>>     "tmp": {
>>>       "location": "/tmp",
>>>       "writable": true,
>>>       "defaultInputFormat": null,
>>>       "allowAccessOutsideWorkspace": false
>>>     },
>>>     "home": {
>>>       "location": "/Users/stefan",
>>>       "writable": true,
>>>       "defaultInputFormat": null,
>>>       "allowAccessOutsideWorkspace": false
>>>     },
>>>     "root": {
>>>       "location": "/",
>>>       "writable": false,
>>>       "defaultInputFormat": null,
>>>       "allowAccessOutsideWorkspace": false
>>>     }
>>>   },
>>>   formats: {
>>>     "parquet": {
>>>       "type": "parquet"
>>>     },
>>>     "json": {
>>>       "type": "json",
>>>       "extensions": [
>>>         "json"
>>>       ]
>>>     }
>>>   },
>>>   enabled: true
>>> }
>>> }
>>> "storage": {
>>> s3: {
>>>   type: "file",
>>>   connection: "s3a://<my-bucket-name>",
>>>   config: {
>>>     "fs.s3a.aws.credentials.provider":
>>> "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider",
>>>     "fs.s3a.endpoint": "s3.eu-central-1.amazonaws.com",
>>>     "fs.s3a.impl.disable.cache": "false"
>>>   },
>>>   workspaces: {
>>>     "root": {
>>>       "location": "/",
>>>       "writable": false,
>>>       "defaultInputFormat": "parquet",
>>>       "allowAccessOutsideWorkspace": false
>>>     }
>>>   },
>>>   "formats": {
>>>     "parquet": {
>>>       "type": "parquet"
>>>     }
>>>   },
>>>   enabled: true
>>> }
>>> }
>>> 
>>> 
>>> 
>>> 
>>> On Mon, Jul 10, 2023 at 6:40 PM Charles Givre <cgi...@gmail.com> wrote:
>>> 
>>>> Can you share your configs with any sensitive info redacted?  The lists
>>>> don't support images, so please just cut/paste the json.
>>>> I had another idea...
>>>> -- C
>>>> 
>>>> 
>>>>> On Jul 10, 2023, at 12:28 PM, Stefan Ziegler <
>>>> stefan.ziegler...@gmail.com> wrote:
>>>>> 
>>>>> Yes, I think I'm following these instructions. And the file is not
>>>>> completely ignored. It creates additional format definitions. Let's
>> say I
>>>>> white list some formats in my storage configuration and Drill adds more
>>>>> formats (which I don't want). Is there another way to start a "vanilla"
>>>>> Drill installation with my own configurations?
>>>>> 
>>>>> Stefan
>>>>> 
>>>>> On Mon, Jul 10, 2023 at 6:17 PM Charles Givre <cgi...@gmail.com>
>> wrote:
>>>>> 
>>>>>> Hi Stefan,
>>>>>> My apologies.. Ok.. so the issue is that the
>>>> storage-plugins-override.conf
>>>>>> is being ignored.  I've never actually used this feature, so I wasn't
>>>>>> familiar with it, but are you folllowing the instructions here [1]
>> with
>>>>>> respect to configuration and restarting Drill?  My suggestion would be
>>>> to
>>>>>> remove all the plugins in the UI and only specify them in the .conf
>>>> file.
>>>>>> Drill has an order of precedence and I suspect what is happening is
>> that
>>>>>> the UI versions have a higher priority than the .conf versions.   Does
>>>> that
>>>>>> make sense?
>>>>>> 
>>>>>> -- C
>>>>>> 
>>>>>> [1]:
>>>>>> 
>>>> 
>> https://drill.apache.org/docs/configuring-storage-plugins/#configuring-storage-plugins-with-the-storage-plugins-overrideconf-file
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jul 10, 2023, at 12:06 PM, Stefan Ziegler <
>>>>>> stefan.ziegler...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi Charles
>>>>>>> 
>>>>>>> I use a "storage-plugins-override.conf" file. My attempt is to have
>> the
>>>>>>> configuration for my storages in a single file and Drill can pick up
>>>> the
>>>>>>> configuration on startup. I put "storage-plugins-override.conf" in
>> the
>>>>>> conf
>>>>>>> directory and Drill creates the storages on startup but (and that is
>> my
>>>>>>> problem) also creates all formats for every storage defined in my
>>>> config
>>>>>>> file. E.g. I have a (local) file type storage and I define two
>> formats
>>>>>>> (parquet and json) in it. Drill does not respect my restriction to
>> two
>>>>>>> formats in the config file but creates all formats known to Drill
>> (like
>>>>>>> iceberg, xml etc.).
>>>>>>> 
>>>>>>> regards
>>>>>>> Stefan
>>>>>>> 
>>>>>>> On Mon, Jul 10, 2023 at 5:30 PM Charles Givre <cgi...@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> HI Stefan,
>>>>>>>> Thanks for your interest in Drill.  You have to define the format
>>>> config
>>>>>>>> for each storage plugin.  Otherwise Drill doesn't know what
>> extension
>>>> to
>>>>>>>> associate with what format plugin.  Out of curiosity, why are you
>>>> using
>>>>>> the
>>>>>>>> .conf files for this?
>>>>>>>> -- C
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Jul 9, 2023, at 12:03 PM, Stefan Ziegler <
>>>>>> stefan.ziegler...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Not defining a format seems to prevent the user from querying the
>>>>>>>> specific
>>>>>>>>> format. E.g. after deleting the xml format definition in the web
>> gui,
>>>>>> I'm
>>>>>>>>> not able to query xml files anymore. So I guess my assumption was
>>>>>> right.
>>>>>>>>> 
>>>>>>>>> Stefan
>>>>>>>>> 
>>>>>>>>> On Sun, Jul 9, 2023 at 5:41 PM Stefan Ziegler <
>>>>>>>> stefan.ziegler...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Btw: I assumed that the list of formats act as a restriction.
>>>> Probably
>>>>>>>> I'm
>>>>>>>>>> wrong.
>>>>>>>>>> 
>>>>>>>>>> Stefan
>>>>>>>>>> 
>>>>>>>>>> On Sun, Jul 9, 2023 at 5:27 PM Stefan Ziegler <
>>>>>>>> stefan.ziegler...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi
>>>>>>>>>>> 
>>>>>>>>>>> I'm using storage-plugins-override.conf to configure the storage
>>>>>>>> plugins
>>>>>>>>>>> on startup. My storage configurations contain only one or two
>>>> formats
>>>>>>>>>>> (parquet, json, csv). Checking the storages in the web gui I
>>>> noticed
>>>>>>>> that
>>>>>>>>>>> for all the storages all formats are enabled, e.g. msaccess,
>>>> iceberg
>>>>>>>> etc.
>>>>>>>>>>> 
>>>>>>>>>>> Is this on purpose or did I do something wrong?
>>>>>>>>>>> 
>>>>>>>>>>> Example configuration:
>>>>>>>>>>> 
>>>>>>>>>>> "storage": {
>>>>>>>>>>> dfs: {
>>>>>>>>>>> type: "file",
>>>>>>>>>>> connection: "file:///",
>>>>>>>>>>> workspaces: {
>>>>>>>>>>>  "tmp": {
>>>>>>>>>>>    "location": "/tmp",
>>>>>>>>>>>    "writable": true,
>>>>>>>>>>>    "defaultInputFormat": null,
>>>>>>>>>>>    "allowAccessOutsideWorkspace": false
>>>>>>>>>>>  },
>>>>>>>>>>>  "root": {
>>>>>>>>>>>    "location": "/",
>>>>>>>>>>>    "writable": false,
>>>>>>>>>>>    "defaultInputFormat": null,
>>>>>>>>>>>    "allowAccessOutsideWorkspace": false
>>>>>>>>>>>  }
>>>>>>>>>>> },
>>>>>>>>>>> formats: {
>>>>>>>>>>>  "parquet": {
>>>>>>>>>>>    "type": "parquet"
>>>>>>>>>>>  },
>>>>>>>>>>>  "json": {
>>>>>>>>>>>    "type": "json",
>>>>>>>>>>>    "extensions": [
>>>>>>>>>>>      "json"
>>>>>>>>>>>    ]
>>>>>>>>>>>  }
>>>>>>>>>>> },
>>>>>>>>>>> enabled: true
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> regards
>>>>>>>>>>> Stefan
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to