Hi Charles,

One of the changes I was looking at was allowing multiple SQL statements per 
REST request to get around the lack of session. The idea would be to issue a 
number of ALTER SESSION, CTTAS, USE and similar statements followed by a single 
query that returns data.


A better solution is to enable session support for the REST API. We discussed 
the challenges involved due the disconnected nature of HTTP requests.

Another good improvement would be a SQL command way to create configs, not just 
JSON editing. That way it would be easier to automate creation of a config. 
Also, it would be handy to be able to externalize configs so they can be stored 
in locations other than ZK (or local disk, in embedded mode.) For this use 
case, a query for user "X" would work against the "s3-X" config would could be 
retrieved from an external system that knows the mapping from user X to the S3 
files visible to X, and the security tokens to use for that user.

The question for now, however, is how to do this with the code that exists in 
Drill 1.17. I'm hoping someone has worked out a solution.


Thanks,
- Paul

 

    On Sunday, May 10, 2020, 1:05:50 PM PDT, Charles Givre <cgi...@gmail.com> 
wrote:  
 
 Hi Avner, Paul, 
I was reading this and wondering:

1.  Is it in fact true (I think it is) that Drill does not allow multiple 
queries to be submitted in one REST request?  I seem to remember running into 
that issue when I was trying to do some of the Superset work.
2.  If a user is required to be authenticated to execute a query, would that 
not prevent the possibility of a non-authenticated user executing arbitrary 
queries against someone else's data?
3.  I would definitely create separate data sources for each tenant, but I 
don't know that it is necessary (or helpful) to create one for each query.  

I'd agree with Paul, that Drill's access model needs improvement and that would 
be a good addition to the project.  We might be able to assist with that if 
there's interest.
Best,
-- C


> On May 10, 2020, at 3:55 PM, Paul Rogers <par0...@yahoo.com.INVALID> wrote:
> 
> Hi Avner,
> 
> Drill was designed for a system in which the user name maps to a certificate 
> on the underlying file system, and the file system provides complete 
> security. This model has not been extended to the cloud world.
> 
> What you want is a way to authenticate your user, map the user to a storage 
> plugin config for only that client's files, then restrict that user to only 
> that config. Further, you'd want the config to obtain S3 keys from a vault of 
> some sort. If you have that, you'd not have to worry about SQL injection 
> since only an authorized user could muck with the SQL, and they could only 
> access their own data -- which they can presumably access anyway.
> 
> 
> At present, Drill has no out-of-the-box security model for this use case; 
> there is no mechanism to associate users with configs, or to externalize S3 
> security keys. Such a system would be a worthwhile addition to the project.
> 
> I wonder, has anyone else found a workaround for this use case? Maybe via 
> Kerberos or some such?
> 
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Sunday, May 10, 2020, 12:04:16 PM PDT, Avner Levy 
><avner.l...@gmail.com> wrote:  
> 
> Hi,
> I'm trying to use Apache Drill as a database for providing SQL over S3
> parquet files.
> Drill is used for serving multi-tenant data for multiple customers.
> Since I need to build the SQL string using the REST API I'm vulnerable to
> SQL injection attacks.
> I do test all user input and close it between apostrophes and
> escape apostrophe in the user input by doubling it but I'm still concerned
> about optional SQL attacks.
> Will adding a different data source (which points to a different folder on
> S3) per tenant is something that will have impact on performance? (I might
> have thousands of those)
> Does it make sense to create the data source on the fly before query?
> Is there another way to limit the sent SQL to a specific folder?
> Thanks,
>  Avner
  

Reply via email to