Thanks. Yes. I'm going to try the renaming approach.

Not a rant but isn't the whole point of a "storage-plugins-override.conf"
to override storage plugin configuration?

Btw: I'm in embedded mode. So I guess I can also use the config files from
/tmp/drill after "fixing" the format configuration in the ui and use them
e.g. in a docker image.

regards
Stefan

On Wed, Jul 12, 2023 at 6:04 PM Charles Givre <[email protected]> wrote:

> My sense of what is happening in your use case is that the configs that
> exist in the UI are overriding the conf file.   What it seems like you want
> is the opposite order of precedence.  I've never used the conf files for
> this, so I don't have a lot of experience with that, but it would seem that
> the best way to get your Drill cluster configured to so what you want is to
> delete or disable the configs in the UI and only use the ones in the config
> file.
>
> By conflicting I meant that let's say that you have a plugin called dfs
> that has the json format enabled.  If you put a configuration for a plugin
> also called dfs in the conf file, what I think is happening is that since
> you have two plugins with the same name, Drill will read the ones from the
> UI.  (FYSA, they aren't actually stored in the UI.  If you are using Drill
> in distributed mode, those configurations are stored in zookeeper.  If you
> are in embedded mode, they are stored on your drive somewhere.)
>
> Anyway,  IMHO, the best thing to do would be to make sure that the plugins
> in your conf file do not have the same names as the pluigns that appear in
> the UI.  That's what I was getting at.  Does that make sense?
> Best,
> -- C
>
> > On Jul 12, 2023, at 11:57 AM, Stefan Ziegler <
> [email protected]> wrote:
> >
> > Hi Charles
> >
> > not sure if I understand you correctly: what do you mean with “not
> conflicting”. My attempt is to not use the UI at all to configure storages.
> I thought this can be achieved by overriding the defaut storages with the
> “override” file. This seems to work except the strange behaviour with the
> formats.
> >
> > regards
> > Stefan
> >
> > Sent from Outlook for iOS<https://aka.ms/o0ukef>
> > ________________________________
> > From: Charles Givre <[email protected]>
> > Sent: Wednesday, July 12, 2023 5:04 PM
> > To: user <[email protected]>
> > Subject: Re: Respecting formats restriction when using
> storage-plugins-override.conf
> >
> > Hi Stefan,
> > My biggest piece of advice here would just be to make sure the plugins
> specified in the override file do not conflict with the UI-based configs.
>  It may make sense to have completely different configs in each location.
> IE:
> >
> > dfs-conf and (plain) dfs.
> >
> > I think that should solve all issues.  In theory if you remove a config
> from the "formats" section, Drill should not be able to parse the file in
> question.  So for example if you don't have the 'csv' format or 'excel'
> then Drill will not be able to parse those formats.
> >
> > Best,
> > -- C
> >
> >
> >> On Jul 11, 2023, at 2:42 AM, Stefan Ziegler <
> [email protected]> wrote:
> >>
> >> The config für dfs in the UI looks like this:
> >>
> >> {
> >> "type": "file",
> >> "connection": "file:///",
> >> "workspaces": {
> >>   "tmp": {
> >>     "location": "/tmp",
> >>     "writable": true,
> >>     "defaultInputFormat": null,
> >>     "allowAccessOutsideWorkspace": false
> >>   },
> >>   "root": {
> >>     "location": "/",
> >>     "writable": false,
> >>     "defaultInputFormat": null,
> >>     "allowAccessOutsideWorkspace": false
> >>   },
> >>   "home": {
> >>     "location": "/Users/stefan",
> >>     "writable": true,
> >>     "defaultInputFormat": null,
> >>     "allowAccessOutsideWorkspace": false
> >>   }
> >> },
> >> "formats": {
> >>   "parquet": {
> >>     "type": "parquet"
> >>   },
> >>   "json": {
> >>     "type": "json",
> >>     "extensions": [
> >>       "json"
> >>     ]
> >>   },
> >>   "excel": {
> >>     "type": "excel",
> >>     "extensions": [
> >>       "xlsx"
> >>     ],
> >>     "lastRow": 1048576,
> >>     "ignoreErrors": true,
> >>     "maxArraySize": -1,
> >>     "thresholdBytesForTempFiles": -1
> >>   },
> >>   "spss": {
> >>     "type": "spss",
> >>     "extensions": [
> >>       "sav"
> >>     ]
> >>   },
> >>   "iceberg": {
> >>     "type": "iceberg",
> >>     "properties": null,
> >>     "caseSensitive": null,
> >>     "includeColumnStats": null,
> >>     "ignoreResiduals": null,
> >>     "snapshotId": null,
> >>     "snapshotAsOfTime": null,
> >>     "fromSnapshotId": null,
> >>     "toSnapshotId": null
> >>   },
> >>   "httpd": {
> >>     "type": "httpd",
> >>     "extensions": [
> >>       "httpd"
> >>     ],
> >>     "logFormat": "common\ncombined"
> >>   },
> >>   "xml": {
> >>     "type": "xml",
> >>     "extensions": [
> >>       "xml"
> >>     ],
> >>     "dataLevel": 1
> >>   },
> >>   "syslog": {
> >>     "type": "syslog",
> >>     "extensions": [
> >>       "syslog"
> >>     ],
> >>     "maxErrors": 10
> >>   },
> >>   "msaccess": {
> >>     "type": "msaccess",
> >>     "extensions": [
> >>       "mdb",
> >>       "accdb"
> >>     ]
> >>   },
> >>   "hdf5": {
> >>     "type": "hdf5",
> >>     "extensions": [
> >>       "h5"
> >>     ],
> >>     "defaultPath": null
> >>   },
> >>   "ltsv": {
> >>     "type": "ltsv",
> >>     "extensions": [
> >>       "ltsv"
> >>     ],
> >>     "parseMode": "lenient",
> >>     "escapeCharacter": null,
> >>     "kvDelimiter": null,
> >>     "entryDelimiter": null,
> >>     "lineEnding": null,
> >>     "quoteChar": null
> >>   },
> >>   "delta": {
> >>     "type": "delta",
> >>     "version": null,
> >>     "timestamp": null
> >>   },
> >>   "shp": {
> >>     "type": "shp",
> >>     "extensions": [
> >>       "shp"
> >>     ]
> >>   },
> >>   "image": {
> >>     "type": "image",
> >>     "extensions": [
> >>       "jpg",
> >>       "jpeg",
> >>       "jpe",
> >>       "tif",
> >>       "tiff",
> >>       "dng",
> >>       "psd",
> >>       "png",
> >>       "bmp",
> >>       "gif",
> >>       "ico",
> >>       "pcx",
> >>       "wav",
> >>       "wave",
> >>       "avi",
> >>       "webp",
> >>       "mov",
> >>       "mp4",
> >>       "m4a",
> >>       "m4p",
> >>       "m4b",
> >>       "m4r",
> >>       "m4v",
> >>       "3gp",
> >>       "3g2",
> >>       "eps",
> >>       "epsf",
> >>       "epsi",
> >>       "ai",
> >>       "arw",
> >>       "crw",
> >>       "cr2",
> >>       "nef",
> >>       "orf",
> >>       "raf",
> >>       "rw2",
> >>       "rwl",
> >>       "srw",
> >>       "x3f"
> >>     ],
> >>     "fileSystemMetadata": true,
> >>     "descriptive": true
> >>   },
> >>   "pdf": {
> >>     "type": "pdf",
> >>     "extensions": [
> >>       "pdf"
> >>     ],
> >>     "extractHeaders": true,
> >>     "extractionAlgorithm": "basic"
> >>   },
> >>   "sas": {
> >>     "type": "sas",
> >>     "extensions": [
> >>       "sas7bdat"
> >>     ]
> >>   },
> >>   "pcap": {
> >>     "type": "pcap",
> >>     "extensions": [
> >>       "pcap",
> >>       "pcapng"
> >>     ]
> >>   }
> >> },
> >> "authMode": "SHARED_USER",
> >> "enabled": true
> >> }
> >>
> >> I'm now able to query some XML data: "SELECT * FROM
> >> dfs.home.`ch.so.afu.abbaustellen.xml`;" Which I actually don't want to
> be
> >> able to (see formats in the "storage-plugins-override.conf" file). If I
> >> remove the xml format section in the config in the UI, I'm not able to
> >> query the xml anymore: "Error: VALIDATION ERROR: From line 1, column 15
> to
> >> line 1, column 51: Object 'ch.so.afu.abbaustellen.xml' not found within
> >> 'dfs.home'".
> >>
> >> regards
> >> Stefan
> >>
> >>
> >> On Mon, Jul 10, 2023 at 9:15 PM Charles Givre <[email protected]> wrote:
> >>
> >>> HI Stefan,
> >>> What's in the config in the UI?  Can you also please clarify what
> queries
> >>> are running which indicate that your configs aren't working?
> >>> Best,
> >>> -- C
> >>>
> >>>
> >>>
> >>>> On Jul 10, 2023, at 1:11 PM, Stefan Ziegler <
> [email protected]>
> >>> wrote:
> >>>>
> >>>> "storage": {
> >>>> cp: {
> >>>>  type: "file",
> >>>>  connection: "classpath:///",
> >>>>  formats: {
> >>>>    "csv" : {
> >>>>      type: "text",
> >>>>      extensions: [ "csv" ],
> >>>>      delimiter: ","
> >>>>    }
> >>>>  }
> >>>>  enabled: true
> >>>> }
> >>>> }
> >>>> "storage": {
> >>>> dfs: {
> >>>>  type: "file",
> >>>>  connection: "file:///",
> >>>>  workspaces: {
> >>>>    "tmp": {
> >>>>      "location": "/tmp",
> >>>>      "writable": true,
> >>>>      "defaultInputFormat": null,
> >>>>      "allowAccessOutsideWorkspace": false
> >>>>    },
> >>>>    "home": {
> >>>>      "location": "/Users/stefan",
> >>>>      "writable": true,
> >>>>      "defaultInputFormat": null,
> >>>>      "allowAccessOutsideWorkspace": false
> >>>>    },
> >>>>    "root": {
> >>>>      "location": "/",
> >>>>      "writable": false,
> >>>>      "defaultInputFormat": null,
> >>>>      "allowAccessOutsideWorkspace": false
> >>>>    }
> >>>>  },
> >>>>  formats: {
> >>>>    "parquet": {
> >>>>      "type": "parquet"
> >>>>    },
> >>>>    "json": {
> >>>>      "type": "json",
> >>>>      "extensions": [
> >>>>        "json"
> >>>>      ]
> >>>>    }
> >>>>  },
> >>>>  enabled: true
> >>>> }
> >>>> }
> >>>> "storage": {
> >>>> s3: {
> >>>>  type: "file",
> >>>>  connection: "s3a://<my-bucket-name>",
> >>>>  config: {
> >>>>    "fs.s3a.aws.credentials.provider":
> >>>> "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider",
> >>>>    "fs.s3a.endpoint": "s3.eu-central-1.amazonaws.com",
> >>>>    "fs.s3a.impl.disable.cache": "false"
> >>>>  },
> >>>>  workspaces: {
> >>>>    "root": {
> >>>>      "location": "/",
> >>>>      "writable": false,
> >>>>      "defaultInputFormat": "parquet",
> >>>>      "allowAccessOutsideWorkspace": false
> >>>>    }
> >>>>  },
> >>>>  "formats": {
> >>>>    "parquet": {
> >>>>      "type": "parquet"
> >>>>    }
> >>>>  },
> >>>>  enabled: true
> >>>> }
> >>>> }
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Jul 10, 2023 at 6:40 PM Charles Givre <[email protected]>
> wrote:
> >>>>
> >>>>> Can you share your configs with any sensitive info redacted?  The
> lists
> >>>>> don't support images, so please just cut/paste the json.
> >>>>> I had another idea...
> >>>>> -- C
> >>>>>
> >>>>>
> >>>>>> On Jul 10, 2023, at 12:28 PM, Stefan Ziegler <
> >>>>> [email protected]> wrote:
> >>>>>>
> >>>>>> Yes, I think I'm following these instructions. And the file is not
> >>>>>> completely ignored. It creates additional format definitions. Let's
> >>> say I
> >>>>>> white list some formats in my storage configuration and Drill adds
> more
> >>>>>> formats (which I don't want). Is there another way to start a
> "vanilla"
> >>>>>> Drill installation with my own configurations?
> >>>>>>
> >>>>>> Stefan
> >>>>>>
> >>>>>> On Mon, Jul 10, 2023 at 6:17 PM Charles Givre <[email protected]>
> >>> wrote:
> >>>>>>
> >>>>>>> Hi Stefan,
> >>>>>>> My apologies.. Ok.. so the issue is that the
> >>>>> storage-plugins-override.conf
> >>>>>>> is being ignored.  I've never actually used this feature, so I
> wasn't
> >>>>>>> familiar with it, but are you folllowing the instructions here [1]
> >>> with
> >>>>>>> respect to configuration and restarting Drill?  My suggestion
> would be
> >>>>> to
> >>>>>>> remove all the plugins in the UI and only specify them in the .conf
> >>>>> file.
> >>>>>>> Drill has an order of precedence and I suspect what is happening is
> >>> that
> >>>>>>> the UI versions have a higher priority than the .conf versions.
>  Does
> >>>>> that
> >>>>>>> make sense?
> >>>>>>>
> >>>>>>> -- C
> >>>>>>>
> >>>>>>> [1]:
> >>>>>>>
> >>>>>
> >>>
> https://drill.apache.org/docs/configuring-storage-plugins/#configuring-storage-plugins-with-the-storage-plugins-overrideconf-file
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Jul 10, 2023, at 12:06 PM, Stefan Ziegler <
> >>>>>>> [email protected]> wrote:
> >>>>>>>>
> >>>>>>>> Hi Charles
> >>>>>>>>
> >>>>>>>> I use a "storage-plugins-override.conf" file. My attempt is to
> have
> >>> the
> >>>>>>>> configuration for my storages in a single file and Drill can pick
> up
> >>>>> the
> >>>>>>>> configuration on startup. I put "storage-plugins-override.conf" in
> >>> the
> >>>>>>> conf
> >>>>>>>> directory and Drill creates the storages on startup but (and that
> is
> >>> my
> >>>>>>>> problem) also creates all formats for every storage defined in my
> >>>>> config
> >>>>>>>> file. E.g. I have a (local) file type storage and I define two
> >>> formats
> >>>>>>>> (parquet and json) in it. Drill does not respect my restriction to
> >>> two
> >>>>>>>> formats in the config file but creates all formats known to Drill
> >>> (like
> >>>>>>>> iceberg, xml etc.).
> >>>>>>>>
> >>>>>>>> regards
> >>>>>>>> Stefan
> >>>>>>>>
> >>>>>>>> On Mon, Jul 10, 2023 at 5:30 PM Charles Givre <[email protected]>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> HI Stefan,
> >>>>>>>>> Thanks for your interest in Drill.  You have to define the format
> >>>>> config
> >>>>>>>>> for each storage plugin.  Otherwise Drill doesn't know what
> >>> extension
> >>>>> to
> >>>>>>>>> associate with what format plugin.  Out of curiosity, why are you
> >>>>> using
> >>>>>>> the
> >>>>>>>>> .conf files for this?
> >>>>>>>>> -- C
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Jul 9, 2023, at 12:03 PM, Stefan Ziegler <
> >>>>>>> [email protected]>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Not defining a format seems to prevent the user from querying
> the
> >>>>>>>>> specific
> >>>>>>>>>> format. E.g. after deleting the xml format definition in the web
> >>> gui,
> >>>>>>> I'm
> >>>>>>>>>> not able to query xml files anymore. So I guess my assumption
> was
> >>>>>>> right.
> >>>>>>>>>>
> >>>>>>>>>> Stefan
> >>>>>>>>>>
> >>>>>>>>>> On Sun, Jul 9, 2023 at 5:41 PM Stefan Ziegler <
> >>>>>>>>> [email protected]>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Btw: I assumed that the list of formats act as a restriction.
> >>>>> Probably
> >>>>>>>>> I'm
> >>>>>>>>>>> wrong.
> >>>>>>>>>>>
> >>>>>>>>>>> Stefan
> >>>>>>>>>>>
> >>>>>>>>>>> On Sun, Jul 9, 2023 at 5:27 PM Stefan Ziegler <
> >>>>>>>>> [email protected]>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm using storage-plugins-override.conf to configure the
> storage
> >>>>>>>>> plugins
> >>>>>>>>>>>> on startup. My storage configurations contain only one or two
> >>>>> formats
> >>>>>>>>>>>> (parquet, json, csv). Checking the storages in the web gui I
> >>>>> noticed
> >>>>>>>>> that
> >>>>>>>>>>>> for all the storages all formats are enabled, e.g. msaccess,
> >>>>> iceberg
> >>>>>>>>> etc.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Is this on purpose or did I do something wrong?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Example configuration:
> >>>>>>>>>>>>
> >>>>>>>>>>>> "storage": {
> >>>>>>>>>>>> dfs: {
> >>>>>>>>>>>> type: "file",
> >>>>>>>>>>>> connection: "file:///",
> >>>>>>>>>>>> workspaces: {
> >>>>>>>>>>>> "tmp": {
> >>>>>>>>>>>>   "location": "/tmp",
> >>>>>>>>>>>>   "writable": true,
> >>>>>>>>>>>>   "defaultInputFormat": null,
> >>>>>>>>>>>>   "allowAccessOutsideWorkspace": false
> >>>>>>>>>>>> },
> >>>>>>>>>>>> "root": {
> >>>>>>>>>>>>   "location": "/",
> >>>>>>>>>>>>   "writable": false,
> >>>>>>>>>>>>   "defaultInputFormat": null,
> >>>>>>>>>>>>   "allowAccessOutsideWorkspace": false
> >>>>>>>>>>>> }
> >>>>>>>>>>>> },
> >>>>>>>>>>>> formats: {
> >>>>>>>>>>>> "parquet": {
> >>>>>>>>>>>>   "type": "parquet"
> >>>>>>>>>>>> },
> >>>>>>>>>>>> "json": {
> >>>>>>>>>>>>   "type": "json",
> >>>>>>>>>>>>   "extensions": [
> >>>>>>>>>>>>     "json"
> >>>>>>>>>>>>   ]
> >>>>>>>>>>>> }
> >>>>>>>>>>>> },
> >>>>>>>>>>>> enabled: true
> >>>>>>>>>>>> }
> >>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> regards
> >>>>>>>>>>>> Stefan
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >
>
>

Reply via email to