Hi Avner,
Drill was designed for a system in which the user name maps to a certificate on
the underlying file system, and the file system provides complete security.
This model has not been extended to the cloud world.
What you want is a way to authenticate your user, map the user to a storage
plugin config for only that client's files, then restrict that user to only
that config. Further, you'd want the config to obtain S3 keys from a vault of
some sort. If you have that, you'd not have to worry about SQL injection since
only an authorized user could muck with the SQL, and they could only access
their own data -- which they can presumably access anyway.
At present, Drill has no out-of-the-box security model for this use case; there
is no mechanism to associate users with configs, or to externalize S3 security
keys. Such a system would be a worthwhile addition to the project.
I wonder, has anyone else found a workaround for this use case? Maybe via
Kerberos or some such?
Thanks,
- Paul
On Sunday, May 10, 2020, 12:04:16 PM PDT, Avner Levy <[email protected]>
wrote:
Hi,
I'm trying to use Apache Drill as a database for providing SQL over S3
parquet files.
Drill is used for serving multi-tenant data for multiple customers.
Since I need to build the SQL string using the REST API I'm vulnerable to
SQL injection attacks.
I do test all user input and close it between apostrophes and
escape apostrophe in the user input by doubling it but I'm still concerned
about optional SQL attacks.
Will adding a different data source (which points to a different folder on
S3) per tenant is something that will have impact on performance? (I might
have thousands of those)
Does it make sense to create the data source on the fly before query?
Is there another way to limit the sent SQL to a specific folder?
Thanks,
Avner