The config für dfs in the UI looks like this:
{
"type": "file",
"connection": "file:///",
"workspaces": {
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"home": {
"location": "/Users/stefan",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
"formats": {
"parquet": {
"type": "parquet"
},
"json": {
"type": "json",
"extensions": [
"json"
]
},
"excel": {
"type": "excel",
"extensions": [
"xlsx"
],
"lastRow": 1048576,
"ignoreErrors": true,
"maxArraySize": -1,
"thresholdBytesForTempFiles": -1
},
"spss": {
"type": "spss",
"extensions": [
"sav"
]
},
"iceberg": {
"type": "iceberg",
"properties": null,
"caseSensitive": null,
"includeColumnStats": null,
"ignoreResiduals": null,
"snapshotId": null,
"snapshotAsOfTime": null,
"fromSnapshotId": null,
"toSnapshotId": null
},
"httpd": {
"type": "httpd",
"extensions": [
"httpd"
],
"logFormat": "common\ncombined"
},
"xml": {
"type": "xml",
"extensions": [
"xml"
],
"dataLevel": 1
},
"syslog": {
"type": "syslog",
"extensions": [
"syslog"
],
"maxErrors": 10
},
"msaccess": {
"type": "msaccess",
"extensions": [
"mdb",
"accdb"
]
},
"hdf5": {
"type": "hdf5",
"extensions": [
"h5"
],
"defaultPath": null
},
"ltsv": {
"type": "ltsv",
"extensions": [
"ltsv"
],
"parseMode": "lenient",
"escapeCharacter": null,
"kvDelimiter": null,
"entryDelimiter": null,
"lineEnding": null,
"quoteChar": null
},
"delta": {
"type": "delta",
"version": null,
"timestamp": null
},
"shp": {
"type": "shp",
"extensions": [
"shp"
]
},
"image": {
"type": "image",
"extensions": [
"jpg",
"jpeg",
"jpe",
"tif",
"tiff",
"dng",
"psd",
"png",
"bmp",
"gif",
"ico",
"pcx",
"wav",
"wave",
"avi",
"webp",
"mov",
"mp4",
"m4a",
"m4p",
"m4b",
"m4r",
"m4v",
"3gp",
"3g2",
"eps",
"epsf",
"epsi",
"ai",
"arw",
"crw",
"cr2",
"nef",
"orf",
"raf",
"rw2",
"rwl",
"srw",
"x3f"
],
"fileSystemMetadata": true,
"descriptive": true
},
"pdf": {
"type": "pdf",
"extensions": [
"pdf"
],
"extractHeaders": true,
"extractionAlgorithm": "basic"
},
"sas": {
"type": "sas",
"extensions": [
"sas7bdat"
]
},
"pcap": {
"type": "pcap",
"extensions": [
"pcap",
"pcapng"
]
}
},
"authMode": "SHARED_USER",
"enabled": true
}
I'm now able to query some XML data: "SELECT * FROM
dfs.home.`ch.so.afu.abbaustellen.xml`;" Which I actually don't want to be
able to (see formats in the "storage-plugins-override.conf" file). If I
remove the xml format section in the config in the UI, I'm not able to
query the xml anymore: "Error: VALIDATION ERROR: From line 1, column 15 to
line 1, column 51: Object 'ch.so.afu.abbaustellen.xml' not found within
'dfs.home'".
regards
Stefan
On Mon, Jul 10, 2023 at 9:15 PM Charles Givre <[email protected]> wrote:
> HI Stefan,
> What's in the config in the UI? Can you also please clarify what queries
> are running which indicate that your configs aren't working?
> Best,
> -- C
>
>
>
> > On Jul 10, 2023, at 1:11 PM, Stefan Ziegler <[email protected]>
> wrote:
> >
> > "storage": {
> > cp: {
> > type: "file",
> > connection: "classpath:///",
> > formats: {
> > "csv" : {
> > type: "text",
> > extensions: [ "csv" ],
> > delimiter: ","
> > }
> > }
> > enabled: true
> > }
> > }
> > "storage": {
> > dfs: {
> > type: "file",
> > connection: "file:///",
> > workspaces: {
> > "tmp": {
> > "location": "/tmp",
> > "writable": true,
> > "defaultInputFormat": null,
> > "allowAccessOutsideWorkspace": false
> > },
> > "home": {
> > "location": "/Users/stefan",
> > "writable": true,
> > "defaultInputFormat": null,
> > "allowAccessOutsideWorkspace": false
> > },
> > "root": {
> > "location": "/",
> > "writable": false,
> > "defaultInputFormat": null,
> > "allowAccessOutsideWorkspace": false
> > }
> > },
> > formats: {
> > "parquet": {
> > "type": "parquet"
> > },
> > "json": {
> > "type": "json",
> > "extensions": [
> > "json"
> > ]
> > }
> > },
> > enabled: true
> > }
> > }
> > "storage": {
> > s3: {
> > type: "file",
> > connection: "s3a://<my-bucket-name>",
> > config: {
> > "fs.s3a.aws.credentials.provider":
> > "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider",
> > "fs.s3a.endpoint": "s3.eu-central-1.amazonaws.com",
> > "fs.s3a.impl.disable.cache": "false"
> > },
> > workspaces: {
> > "root": {
> > "location": "/",
> > "writable": false,
> > "defaultInputFormat": "parquet",
> > "allowAccessOutsideWorkspace": false
> > }
> > },
> > "formats": {
> > "parquet": {
> > "type": "parquet"
> > }
> > },
> > enabled: true
> > }
> > }
> >
> >
> >
> >
> > On Mon, Jul 10, 2023 at 6:40 PM Charles Givre <[email protected]> wrote:
> >
> >> Can you share your configs with any sensitive info redacted? The lists
> >> don't support images, so please just cut/paste the json.
> >> I had another idea...
> >> -- C
> >>
> >>
> >>> On Jul 10, 2023, at 12:28 PM, Stefan Ziegler <
> >> [email protected]> wrote:
> >>>
> >>> Yes, I think I'm following these instructions. And the file is not
> >>> completely ignored. It creates additional format definitions. Let's
> say I
> >>> white list some formats in my storage configuration and Drill adds more
> >>> formats (which I don't want). Is there another way to start a "vanilla"
> >>> Drill installation with my own configurations?
> >>>
> >>> Stefan
> >>>
> >>> On Mon, Jul 10, 2023 at 6:17 PM Charles Givre <[email protected]>
> wrote:
> >>>
> >>>> Hi Stefan,
> >>>> My apologies.. Ok.. so the issue is that the
> >> storage-plugins-override.conf
> >>>> is being ignored. I've never actually used this feature, so I wasn't
> >>>> familiar with it, but are you folllowing the instructions here [1]
> with
> >>>> respect to configuration and restarting Drill? My suggestion would be
> >> to
> >>>> remove all the plugins in the UI and only specify them in the .conf
> >> file.
> >>>> Drill has an order of precedence and I suspect what is happening is
> that
> >>>> the UI versions have a higher priority than the .conf versions. Does
> >> that
> >>>> make sense?
> >>>>
> >>>> -- C
> >>>>
> >>>> [1]:
> >>>>
> >>
> https://drill.apache.org/docs/configuring-storage-plugins/#configuring-storage-plugins-with-the-storage-plugins-overrideconf-file
> >>>>
> >>>>
> >>>>
> >>>>> On Jul 10, 2023, at 12:06 PM, Stefan Ziegler <
> >>>> [email protected]> wrote:
> >>>>>
> >>>>> Hi Charles
> >>>>>
> >>>>> I use a "storage-plugins-override.conf" file. My attempt is to have
> the
> >>>>> configuration for my storages in a single file and Drill can pick up
> >> the
> >>>>> configuration on startup. I put "storage-plugins-override.conf" in
> the
> >>>> conf
> >>>>> directory and Drill creates the storages on startup but (and that is
> my
> >>>>> problem) also creates all formats for every storage defined in my
> >> config
> >>>>> file. E.g. I have a (local) file type storage and I define two
> formats
> >>>>> (parquet and json) in it. Drill does not respect my restriction to
> two
> >>>>> formats in the config file but creates all formats known to Drill
> (like
> >>>>> iceberg, xml etc.).
> >>>>>
> >>>>> regards
> >>>>> Stefan
> >>>>>
> >>>>> On Mon, Jul 10, 2023 at 5:30 PM Charles Givre <[email protected]>
> >> wrote:
> >>>>>
> >>>>>> HI Stefan,
> >>>>>> Thanks for your interest in Drill. You have to define the format
> >> config
> >>>>>> for each storage plugin. Otherwise Drill doesn't know what
> extension
> >> to
> >>>>>> associate with what format plugin. Out of curiosity, why are you
> >> using
> >>>> the
> >>>>>> .conf files for this?
> >>>>>> -- C
> >>>>>>
> >>>>>>
> >>>>>>> On Jul 9, 2023, at 12:03 PM, Stefan Ziegler <
> >>>> [email protected]>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Not defining a format seems to prevent the user from querying the
> >>>>>> specific
> >>>>>>> format. E.g. after deleting the xml format definition in the web
> gui,
> >>>> I'm
> >>>>>>> not able to query xml files anymore. So I guess my assumption was
> >>>> right.
> >>>>>>>
> >>>>>>> Stefan
> >>>>>>>
> >>>>>>> On Sun, Jul 9, 2023 at 5:41 PM Stefan Ziegler <
> >>>>>> [email protected]>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Btw: I assumed that the list of formats act as a restriction.
> >> Probably
> >>>>>> I'm
> >>>>>>>> wrong.
> >>>>>>>>
> >>>>>>>> Stefan
> >>>>>>>>
> >>>>>>>> On Sun, Jul 9, 2023 at 5:27 PM Stefan Ziegler <
> >>>>>> [email protected]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi
> >>>>>>>>>
> >>>>>>>>> I'm using storage-plugins-override.conf to configure the storage
> >>>>>> plugins
> >>>>>>>>> on startup. My storage configurations contain only one or two
> >> formats
> >>>>>>>>> (parquet, json, csv). Checking the storages in the web gui I
> >> noticed
> >>>>>> that
> >>>>>>>>> for all the storages all formats are enabled, e.g. msaccess,
> >> iceberg
> >>>>>> etc.
> >>>>>>>>>
> >>>>>>>>> Is this on purpose or did I do something wrong?
> >>>>>>>>>
> >>>>>>>>> Example configuration:
> >>>>>>>>>
> >>>>>>>>> "storage": {
> >>>>>>>>> dfs: {
> >>>>>>>>> type: "file",
> >>>>>>>>> connection: "file:///",
> >>>>>>>>> workspaces: {
> >>>>>>>>> "tmp": {
> >>>>>>>>> "location": "/tmp",
> >>>>>>>>> "writable": true,
> >>>>>>>>> "defaultInputFormat": null,
> >>>>>>>>> "allowAccessOutsideWorkspace": false
> >>>>>>>>> },
> >>>>>>>>> "root": {
> >>>>>>>>> "location": "/",
> >>>>>>>>> "writable": false,
> >>>>>>>>> "defaultInputFormat": null,
> >>>>>>>>> "allowAccessOutsideWorkspace": false
> >>>>>>>>> }
> >>>>>>>>> },
> >>>>>>>>> formats: {
> >>>>>>>>> "parquet": {
> >>>>>>>>> "type": "parquet"
> >>>>>>>>> },
> >>>>>>>>> "json": {
> >>>>>>>>> "type": "json",
> >>>>>>>>> "extensions": [
> >>>>>>>>> "json"
> >>>>>>>>> ]
> >>>>>>>>> }
> >>>>>>>>> },
> >>>>>>>>> enabled: true
> >>>>>>>>> }
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> regards
> >>>>>>>>> Stefan
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>