Hi Charles

not sure if I understand you correctly: what do you mean with “not 
conflicting”. My attempt is to not use the UI at all to configure storages. I 
thought this can be achieved by overriding the defaut storages with the 
“override” file. This seems to work except the strange behaviour with the 
formats.

regards
Stefan

Sent from Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Charles Givre <[email protected]>
Sent: Wednesday, July 12, 2023 5:04 PM
To: user <[email protected]>
Subject: Re: Respecting formats restriction when using 
storage-plugins-override.conf

Hi Stefan,
My biggest piece of advice here would just be to make sure the plugins 
specified in the override file do not conflict with the UI-based configs.   It 
may make sense to have completely different configs in each location. IE:

dfs-conf and (plain) dfs.

I think that should solve all issues.  In theory if you remove a config from 
the "formats" section, Drill should not be able to parse the file in question.  
So for example if you don't have the 'csv' format or 'excel' then Drill will 
not be able to parse those formats.

Best,
-- C


> On Jul 11, 2023, at 2:42 AM, Stefan Ziegler <[email protected]> 
> wrote:
>
> The config für dfs in the UI looks like this:
>
> {
>  "type": "file",
>  "connection": "file:///",
>  "workspaces": {
>    "tmp": {
>      "location": "/tmp",
>      "writable": true,
>      "defaultInputFormat": null,
>      "allowAccessOutsideWorkspace": false
>    },
>    "root": {
>      "location": "/",
>      "writable": false,
>      "defaultInputFormat": null,
>      "allowAccessOutsideWorkspace": false
>    },
>    "home": {
>      "location": "/Users/stefan",
>      "writable": true,
>      "defaultInputFormat": null,
>      "allowAccessOutsideWorkspace": false
>    }
>  },
>  "formats": {
>    "parquet": {
>      "type": "parquet"
>    },
>    "json": {
>      "type": "json",
>      "extensions": [
>        "json"
>      ]
>    },
>    "excel": {
>      "type": "excel",
>      "extensions": [
>        "xlsx"
>      ],
>      "lastRow": 1048576,
>      "ignoreErrors": true,
>      "maxArraySize": -1,
>      "thresholdBytesForTempFiles": -1
>    },
>    "spss": {
>      "type": "spss",
>      "extensions": [
>        "sav"
>      ]
>    },
>    "iceberg": {
>      "type": "iceberg",
>      "properties": null,
>      "caseSensitive": null,
>      "includeColumnStats": null,
>      "ignoreResiduals": null,
>      "snapshotId": null,
>      "snapshotAsOfTime": null,
>      "fromSnapshotId": null,
>      "toSnapshotId": null
>    },
>    "httpd": {
>      "type": "httpd",
>      "extensions": [
>        "httpd"
>      ],
>      "logFormat": "common\ncombined"
>    },
>    "xml": {
>      "type": "xml",
>      "extensions": [
>        "xml"
>      ],
>      "dataLevel": 1
>    },
>    "syslog": {
>      "type": "syslog",
>      "extensions": [
>        "syslog"
>      ],
>      "maxErrors": 10
>    },
>    "msaccess": {
>      "type": "msaccess",
>      "extensions": [
>        "mdb",
>        "accdb"
>      ]
>    },
>    "hdf5": {
>      "type": "hdf5",
>      "extensions": [
>        "h5"
>      ],
>      "defaultPath": null
>    },
>    "ltsv": {
>      "type": "ltsv",
>      "extensions": [
>        "ltsv"
>      ],
>      "parseMode": "lenient",
>      "escapeCharacter": null,
>      "kvDelimiter": null,
>      "entryDelimiter": null,
>      "lineEnding": null,
>      "quoteChar": null
>    },
>    "delta": {
>      "type": "delta",
>      "version": null,
>      "timestamp": null
>    },
>    "shp": {
>      "type": "shp",
>      "extensions": [
>        "shp"
>      ]
>    },
>    "image": {
>      "type": "image",
>      "extensions": [
>        "jpg",
>        "jpeg",
>        "jpe",
>        "tif",
>        "tiff",
>        "dng",
>        "psd",
>        "png",
>        "bmp",
>        "gif",
>        "ico",
>        "pcx",
>        "wav",
>        "wave",
>        "avi",
>        "webp",
>        "mov",
>        "mp4",
>        "m4a",
>        "m4p",
>        "m4b",
>        "m4r",
>        "m4v",
>        "3gp",
>        "3g2",
>        "eps",
>        "epsf",
>        "epsi",
>        "ai",
>        "arw",
>        "crw",
>        "cr2",
>        "nef",
>        "orf",
>        "raf",
>        "rw2",
>        "rwl",
>        "srw",
>        "x3f"
>      ],
>      "fileSystemMetadata": true,
>      "descriptive": true
>    },
>    "pdf": {
>      "type": "pdf",
>      "extensions": [
>        "pdf"
>      ],
>      "extractHeaders": true,
>      "extractionAlgorithm": "basic"
>    },
>    "sas": {
>      "type": "sas",
>      "extensions": [
>        "sas7bdat"
>      ]
>    },
>    "pcap": {
>      "type": "pcap",
>      "extensions": [
>        "pcap",
>        "pcapng"
>      ]
>    }
>  },
>  "authMode": "SHARED_USER",
>  "enabled": true
> }
>
> I'm now able to query some XML data: "SELECT * FROM
> dfs.home.`ch.so.afu.abbaustellen.xml`;" Which I actually don't want to be
> able to (see formats in the "storage-plugins-override.conf" file). If I
> remove the xml format section in the config in the UI, I'm not able to
> query the xml anymore: "Error: VALIDATION ERROR: From line 1, column 15 to
> line 1, column 51: Object 'ch.so.afu.abbaustellen.xml' not found within
> 'dfs.home'".
>
> regards
> Stefan
>
>
> On Mon, Jul 10, 2023 at 9:15 PM Charles Givre <[email protected]> wrote:
>
>> HI Stefan,
>> What's in the config in the UI?  Can you also please clarify what queries
>> are running which indicate that your configs aren't working?
>> Best,
>> -- C
>>
>>
>>
>>> On Jul 10, 2023, at 1:11 PM, Stefan Ziegler <[email protected]>
>> wrote:
>>>
>>> "storage": {
>>> cp: {
>>>   type: "file",
>>>   connection: "classpath:///",
>>>   formats: {
>>>     "csv" : {
>>>       type: "text",
>>>       extensions: [ "csv" ],
>>>       delimiter: ","
>>>     }
>>>   }
>>>   enabled: true
>>> }
>>> }
>>> "storage": {
>>> dfs: {
>>>   type: "file",
>>>   connection: "file:///",
>>>   workspaces: {
>>>     "tmp": {
>>>       "location": "/tmp",
>>>       "writable": true,
>>>       "defaultInputFormat": null,
>>>       "allowAccessOutsideWorkspace": false
>>>     },
>>>     "home": {
>>>       "location": "/Users/stefan",
>>>       "writable": true,
>>>       "defaultInputFormat": null,
>>>       "allowAccessOutsideWorkspace": false
>>>     },
>>>     "root": {
>>>       "location": "/",
>>>       "writable": false,
>>>       "defaultInputFormat": null,
>>>       "allowAccessOutsideWorkspace": false
>>>     }
>>>   },
>>>   formats: {
>>>     "parquet": {
>>>       "type": "parquet"
>>>     },
>>>     "json": {
>>>       "type": "json",
>>>       "extensions": [
>>>         "json"
>>>       ]
>>>     }
>>>   },
>>>   enabled: true
>>> }
>>> }
>>> "storage": {
>>> s3: {
>>>   type: "file",
>>>   connection: "s3a://<my-bucket-name>",
>>>   config: {
>>>     "fs.s3a.aws.credentials.provider":
>>> "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider",
>>>     "fs.s3a.endpoint": "s3.eu-central-1.amazonaws.com",
>>>     "fs.s3a.impl.disable.cache": "false"
>>>   },
>>>   workspaces: {
>>>     "root": {
>>>       "location": "/",
>>>       "writable": false,
>>>       "defaultInputFormat": "parquet",
>>>       "allowAccessOutsideWorkspace": false
>>>     }
>>>   },
>>>   "formats": {
>>>     "parquet": {
>>>       "type": "parquet"
>>>     }
>>>   },
>>>   enabled: true
>>> }
>>> }
>>>
>>>
>>>
>>>
>>> On Mon, Jul 10, 2023 at 6:40 PM Charles Givre <[email protected]> wrote:
>>>
>>>> Can you share your configs with any sensitive info redacted?  The lists
>>>> don't support images, so please just cut/paste the json.
>>>> I had another idea...
>>>> -- C
>>>>
>>>>
>>>>> On Jul 10, 2023, at 12:28 PM, Stefan Ziegler <
>>>> [email protected]> wrote:
>>>>>
>>>>> Yes, I think I'm following these instructions. And the file is not
>>>>> completely ignored. It creates additional format definitions. Let's
>> say I
>>>>> white list some formats in my storage configuration and Drill adds more
>>>>> formats (which I don't want). Is there another way to start a "vanilla"
>>>>> Drill installation with my own configurations?
>>>>>
>>>>> Stefan
>>>>>
>>>>> On Mon, Jul 10, 2023 at 6:17 PM Charles Givre <[email protected]>
>> wrote:
>>>>>
>>>>>> Hi Stefan,
>>>>>> My apologies.. Ok.. so the issue is that the
>>>> storage-plugins-override.conf
>>>>>> is being ignored.  I've never actually used this feature, so I wasn't
>>>>>> familiar with it, but are you folllowing the instructions here [1]
>> with
>>>>>> respect to configuration and restarting Drill?  My suggestion would be
>>>> to
>>>>>> remove all the plugins in the UI and only specify them in the .conf
>>>> file.
>>>>>> Drill has an order of precedence and I suspect what is happening is
>> that
>>>>>> the UI versions have a higher priority than the .conf versions.   Does
>>>> that
>>>>>> make sense?
>>>>>>
>>>>>> -- C
>>>>>>
>>>>>> [1]:
>>>>>>
>>>>
>> https://drill.apache.org/docs/configuring-storage-plugins/#configuring-storage-plugins-with-the-storage-plugins-overrideconf-file
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Jul 10, 2023, at 12:06 PM, Stefan Ziegler <
>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>> Hi Charles
>>>>>>>
>>>>>>> I use a "storage-plugins-override.conf" file. My attempt is to have
>> the
>>>>>>> configuration for my storages in a single file and Drill can pick up
>>>> the
>>>>>>> configuration on startup. I put "storage-plugins-override.conf" in
>> the
>>>>>> conf
>>>>>>> directory and Drill creates the storages on startup but (and that is
>> my
>>>>>>> problem) also creates all formats for every storage defined in my
>>>> config
>>>>>>> file. E.g. I have a (local) file type storage and I define two
>> formats
>>>>>>> (parquet and json) in it. Drill does not respect my restriction to
>> two
>>>>>>> formats in the config file but creates all formats known to Drill
>> (like
>>>>>>> iceberg, xml etc.).
>>>>>>>
>>>>>>> regards
>>>>>>> Stefan
>>>>>>>
>>>>>>> On Mon, Jul 10, 2023 at 5:30 PM Charles Givre <[email protected]>
>>>> wrote:
>>>>>>>
>>>>>>>> HI Stefan,
>>>>>>>> Thanks for your interest in Drill.  You have to define the format
>>>> config
>>>>>>>> for each storage plugin.  Otherwise Drill doesn't know what
>> extension
>>>> to
>>>>>>>> associate with what format plugin.  Out of curiosity, why are you
>>>> using
>>>>>> the
>>>>>>>> .conf files for this?
>>>>>>>> -- C
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Jul 9, 2023, at 12:03 PM, Stefan Ziegler <
>>>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Not defining a format seems to prevent the user from querying the
>>>>>>>> specific
>>>>>>>>> format. E.g. after deleting the xml format definition in the web
>> gui,
>>>>>> I'm
>>>>>>>>> not able to query xml files anymore. So I guess my assumption was
>>>>>> right.
>>>>>>>>>
>>>>>>>>> Stefan
>>>>>>>>>
>>>>>>>>> On Sun, Jul 9, 2023 at 5:41 PM Stefan Ziegler <
>>>>>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Btw: I assumed that the list of formats act as a restriction.
>>>> Probably
>>>>>>>> I'm
>>>>>>>>>> wrong.
>>>>>>>>>>
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>> On Sun, Jul 9, 2023 at 5:27 PM Stefan Ziegler <
>>>>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi
>>>>>>>>>>>
>>>>>>>>>>> I'm using storage-plugins-override.conf to configure the storage
>>>>>>>> plugins
>>>>>>>>>>> on startup. My storage configurations contain only one or two
>>>> formats
>>>>>>>>>>> (parquet, json, csv). Checking the storages in the web gui I
>>>> noticed
>>>>>>>> that
>>>>>>>>>>> for all the storages all formats are enabled, e.g. msaccess,
>>>> iceberg
>>>>>>>> etc.
>>>>>>>>>>>
>>>>>>>>>>> Is this on purpose or did I do something wrong?
>>>>>>>>>>>
>>>>>>>>>>> Example configuration:
>>>>>>>>>>>
>>>>>>>>>>> "storage": {
>>>>>>>>>>> dfs: {
>>>>>>>>>>> type: "file",
>>>>>>>>>>> connection: "file:///",
>>>>>>>>>>> workspaces: {
>>>>>>>>>>>  "tmp": {
>>>>>>>>>>>    "location": "/tmp",
>>>>>>>>>>>    "writable": true,
>>>>>>>>>>>    "defaultInputFormat": null,
>>>>>>>>>>>    "allowAccessOutsideWorkspace": false
>>>>>>>>>>>  },
>>>>>>>>>>>  "root": {
>>>>>>>>>>>    "location": "/",
>>>>>>>>>>>    "writable": false,
>>>>>>>>>>>    "defaultInputFormat": null,
>>>>>>>>>>>    "allowAccessOutsideWorkspace": false
>>>>>>>>>>>  }
>>>>>>>>>>> },
>>>>>>>>>>> formats: {
>>>>>>>>>>>  "parquet": {
>>>>>>>>>>>    "type": "parquet"
>>>>>>>>>>>  },
>>>>>>>>>>>  "json": {
>>>>>>>>>>>    "type": "json",
>>>>>>>>>>>    "extensions": [
>>>>>>>>>>>      "json"
>>>>>>>>>>>    ]
>>>>>>>>>>>  }
>>>>>>>>>>> },
>>>>>>>>>>> enabled: true
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> regards
>>>>>>>>>>> Stefan
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>

Reply via email to