Hi Avner, Paul, I was reading this and wondering: 1. Is it in fact true (I think it is) that Drill does not allow multiple queries to be submitted in one REST request? I seem to remember running into that issue when I was trying to do some of the Superset work. 2. If a user is required to be authenticated to execute a query, would that not prevent the possibility of a non-authenticated user executing arbitrary queries against someone else's data? 3. I would definitely create separate data sources for each tenant, but I don't know that it is necessary (or helpful) to create one for each query.
I'd agree with Paul, that Drill's access model needs improvement and that would be a good addition to the project. We might be able to assist with that if there's interest. Best, -- C > On May 10, 2020, at 3:55 PM, Paul Rogers <[email protected]> wrote: > > Hi Avner, > > Drill was designed for a system in which the user name maps to a certificate > on the underlying file system, and the file system provides complete > security. This model has not been extended to the cloud world. > > What you want is a way to authenticate your user, map the user to a storage > plugin config for only that client's files, then restrict that user to only > that config. Further, you'd want the config to obtain S3 keys from a vault of > some sort. If you have that, you'd not have to worry about SQL injection > since only an authorized user could muck with the SQL, and they could only > access their own data -- which they can presumably access anyway. > > > At present, Drill has no out-of-the-box security model for this use case; > there is no mechanism to associate users with configs, or to externalize S3 > security keys. Such a system would be a worthwhile addition to the project. > > I wonder, has anyone else found a workaround for this use case? Maybe via > Kerberos or some such? > > > Thanks, > - Paul > > > > On Sunday, May 10, 2020, 12:04:16 PM PDT, Avner Levy > <[email protected]> wrote: > > Hi, > I'm trying to use Apache Drill as a database for providing SQL over S3 > parquet files. > Drill is used for serving multi-tenant data for multiple customers. > Since I need to build the SQL string using the REST API I'm vulnerable to > SQL injection attacks. > I do test all user input and close it between apostrophes and > escape apostrophe in the user input by doubling it but I'm still concerned > about optional SQL attacks. > Will adding a different data source (which points to a different folder on > S3) per tenant is something that will have impact on performance? (I might > have thousands of those) > Does it make sense to create the data source on the fly before query? > Is there another way to limit the sent SQL to a specific folder? > Thanks, > Avner
