Hi Avner, Paul, 
I was reading this and wondering:

1.  Is it in fact true (I think it is) that Drill does not allow multiple 
queries to be submitted in one REST request?  I seem to remember running into 
that issue when I was trying to do some of the Superset work.
2.  If a user is required to be authenticated to execute a query, would that 
not prevent the possibility of a non-authenticated user executing arbitrary 
queries against someone else's data?
3.  I would definitely create separate data sources for each tenant, but I 
don't know that it is necessary (or helpful) to create one for each query.  

I'd agree with Paul, that Drill's access model needs improvement and that would 
be a good addition to the project.  We might be able to assist with that if 
there's interest.
Best,
-- C


> On May 10, 2020, at 3:55 PM, Paul Rogers <[email protected]> wrote:
> 
> Hi Avner,
> 
> Drill was designed for a system in which the user name maps to a certificate 
> on the underlying file system, and the file system provides complete 
> security. This model has not been extended to the cloud world.
> 
> What you want is a way to authenticate your user, map the user to a storage 
> plugin config for only that client's files, then restrict that user to only 
> that config. Further, you'd want the config to obtain S3 keys from a vault of 
> some sort. If you have that, you'd not have to worry about SQL injection 
> since only an authorized user could muck with the SQL, and they could only 
> access their own data -- which they can presumably access anyway.
> 
> 
> At present, Drill has no out-of-the-box security model for this use case; 
> there is no mechanism to associate users with configs, or to externalize S3 
> security keys. Such a system would be a worthwhile addition to the project.
> 
> I wonder, has anyone else found a workaround for this use case? Maybe via 
> Kerberos or some such?
> 
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Sunday, May 10, 2020, 12:04:16 PM PDT, Avner Levy 
> <[email protected]> wrote:  
> 
> Hi,
> I'm trying to use Apache Drill as a database for providing SQL over S3
> parquet files.
> Drill is used for serving multi-tenant data for multiple customers.
> Since I need to build the SQL string using the REST API I'm vulnerable to
> SQL injection attacks.
> I do test all user input and close it between apostrophes and
> escape apostrophe in the user input by doubling it but I'm still concerned
> about optional SQL attacks.
> Will adding a different data source (which points to a different folder on
> S3) per tenant is something that will have impact on performance? (I might
> have thousands of those)
> Does it make sense to create the data source on the fly before query?
> Is there another way to limit the sent SQL to a specific folder?
> Thanks,
>   Avner

Reply via email to