Thanks Paul, It seems I wasn't clear enough in my previous email. I have a server in between the end users and drill (exactly as you suggested), my concern is SQL attacks on that server by bad parameters. I've tried Athena and I think they too don't support prepared statements. In addition, they were slower than Drill in my scenarios. If Drill had a utility escaping function that clients could use on their input and know they are safe it would help reducing the risk. For now, I guess it leaves me with the option of creating a data source per each client (hoping few thousands of those won't hurt performance) or making sure I properly escape all users input. Appreciate your help. Thanks, Avner
On Sun, May 10, 2020 at 9:09 PM Paul Rogers <par0...@yahoo.com.invalid> wrote: > Hi Avner, > > Drill does not support prepared statements. Nor does Drill support > statements with parameters. This is true with all interfaces. These would > be great features; but they've never been implemented. > > > Drill was designed to operate in a Hadoop-like environment with > semi-trusted users. (Meaning that, if any user did something malicious, you > could sue or fire them.) As noted, the file system enforced security. There > was no notion of the public using Drill to access secure data, with Drill > acting as the secure gateway. Again, code could be added, but it is not > there today. FWIW, in its present state, I would not trust Drill on a > public web site with sensitive data. > > > Given how Drill acts today, I'd wager your best bet is to insert your own > server between your user and Drill. Allow the user to specify queries in > some simple, non-SQL way. Then, your server can build the SQL and forward > it to Drill. > > Public Internet (Web Browser --> Secure Gateway) --> Private network (App > server --> Drill) > > > For example, if I want to know about "Orders", I can specify a date range. > Your server fills in the storage plugin, table name, WHERE clause, etc. to > create the SQL. Such an approach allows the Drill REST API is on a private > IP address within your data center. Only your outward-facing user service > is on a public IP. With this approach, there is no SQL injection risk > because you do not directly use the web-provided info in a SQL statement. > Of course, you have to build your SQL statement correctly, as you are > doing. Don't just append web text to a SQL statement. > > > I don't think this is unique to Drill. I'd be surprised if most people > allow, say, public access to their HBase, Cassandra or MySQL DBs. > > > Thanks, > - Paul > > > > On Sunday, May 10, 2020, 5:29:13 PM PDT, Avner Levy < > avner.l...@gmail.com> wrote: > > Hi Charles, Paul, > Thanks for your answers. > I'm interested in a case, where there is a Rest service which > authenticate the service's users, get a request with the user parameters > and build from it the SQL sent to Drill. > The customers are identified by some account ID and they send for example > the name of an entity they are looking for as a parameter in the service's > REST request. > Then the service can build the select (just an example): > SELECT x FROM S3.db.`data/[CUSTOMER_ID]/data.parquet` where > name='CUSTOMER_USER_INPUT]' > > In such case, they can still send in CUSTOMER_USER_INPUT the following: "x' > union SELECT x FROM S3.db.`data/[OTHER_CUSTOMER_ID]/data.parquet`". > > Usually such stuff are solved with prepared statements, but I believe this > isn't supported over REST. > I would prefer not having to authenticate my end users to Drill since this > creates more work and complexity. > Is there a way to have prepared statements in Drill? > Is it supported in other protocols? (JDBC/ODBC) > Limiting the query folder outside the SQL would do the job as well. > Any feedback is appreciated, > Thanks, > Avner > > > On Sun, May 10, 2020 at 5:57 PM Paul Rogers <par0...@yahoo.com.invalid> > wrote: > > > Hi Charles, > > > > One of the changes I was looking at was allowing multiple SQL statements > > per REST request to get around the lack of session. The idea would be to > > issue a number of ALTER SESSION, CTTAS, USE and similar statements > followed > > by a single query that returns data. > > > > > > A better solution is to enable session support for the REST API. We > > discussed the challenges involved due the disconnected nature of HTTP > > requests. > > > > Another good improvement would be a SQL command way to create configs, > not > > just JSON editing. That way it would be easier to automate creation of a > > config. Also, it would be handy to be able to externalize configs so they > > can be stored in locations other than ZK (or local disk, in embedded > mode.) > > For this use case, a query for user "X" would work against the "s3-X" > > config would could be retrieved from an external system that knows the > > mapping from user X to the S3 files visible to X, and the security tokens > > to use for that user. > > > > The question for now, however, is how to do this with the code that > exists > > in Drill 1.17. I'm hoping someone has worked out a solution. > > > > > > Thanks, > > - Paul > > > > > > > > On Sunday, May 10, 2020, 1:05:50 PM PDT, Charles Givre < > > cgi...@gmail.com> wrote: > > > > Hi Avner, Paul, > > I was reading this and wondering: > > > > 1. Is it in fact true (I think it is) that Drill does not allow multiple > > queries to be submitted in one REST request? I seem to remember running > > into that issue when I was trying to do some of the Superset work. > > 2. If a user is required to be authenticated to execute a query, would > > that not prevent the possibility of a non-authenticated user executing > > arbitrary queries against someone else's data? > > 3. I would definitely create separate data sources for each tenant, but > I > > don't know that it is necessary (or helpful) to create one for each > query. > > > > I'd agree with Paul, that Drill's access model needs improvement and that > > would be a good addition to the project. We might be able to assist with > > that if there's interest. > > Best, > > -- C > > > > > > > On May 10, 2020, at 3:55 PM, Paul Rogers <par0...@yahoo.com.INVALID> > > wrote: > > > > > > Hi Avner, > > > > > > Drill was designed for a system in which the user name maps to a > > certificate on the underlying file system, and the file system provides > > complete security. This model has not been extended to the cloud world. > > > > > > What you want is a way to authenticate your user, map the user to a > > storage plugin config for only that client's files, then restrict that > user > > to only that config. Further, you'd want the config to obtain S3 keys > from > > a vault of some sort. If you have that, you'd not have to worry about SQL > > injection since only an authorized user could muck with the SQL, and they > > could only access their own data -- which they can presumably access > anyway. > > > > > > > > > At present, Drill has no out-of-the-box security model for this use > > case; there is no mechanism to associate users with configs, or to > > externalize S3 security keys. Such a system would be a worthwhile > addition > > to the project. > > > > > > I wonder, has anyone else found a workaround for this use case? Maybe > > via Kerberos or some such? > > > > > > > > > Thanks, > > > - Paul > > > > > > > > > > > > On Sunday, May 10, 2020, 12:04:16 PM PDT, Avner Levy < > > avner.l...@gmail.com> wrote: > > > > > > Hi, > > > I'm trying to use Apache Drill as a database for providing SQL over S3 > > > parquet files. > > > Drill is used for serving multi-tenant data for multiple customers. > > > Since I need to build the SQL string using the REST API I'm vulnerable > to > > > SQL injection attacks. > > > I do test all user input and close it between apostrophes and > > > escape apostrophe in the user input by doubling it but I'm still > > concerned > > > about optional SQL attacks. > > > Will adding a different data source (which points to a different folder > > on > > > S3) per tenant is something that will have impact on performance? (I > > might > > > have thousands of those) > > > Does it make sense to create the data source on the fly before query? > > > Is there another way to limit the sent SQL to a specific folder? > > > Thanks, > > > Avner > > >