Thanks Paul,
It seems I wasn't clear enough in my previous email.
I have a server in between the end users and drill (exactly as you
suggested), my concern is SQL attacks on that server by bad parameters.
I've tried Athena and I think they too don't support prepared statements.
In addition, they were slower than Drill in my scenarios.
If Drill had a utility escaping function that clients could use on their
input and know they are safe it would help reducing the risk.
For now, I guess it leaves me with the option of creating a data source per
each client (hoping few thousands of those won't hurt performance) or
making sure I properly escape all users input.
Appreciate your help.
Thanks,
  Avner

On Sun, May 10, 2020 at 9:09 PM Paul Rogers <par0...@yahoo.com.invalid>
wrote:

> Hi Avner,
>
> Drill does not support prepared statements. Nor does Drill support
> statements with parameters. This is true with all interfaces. These would
> be great features; but they've never been implemented.
>
>
> Drill was designed to operate in a Hadoop-like environment with
> semi-trusted users. (Meaning that, if any user did something malicious, you
> could sue or fire them.) As noted, the file system enforced security. There
> was no notion of the public using Drill to access secure data, with Drill
> acting as the secure gateway. Again, code could be added, but it is not
> there today. FWIW, in its present state, I would not trust Drill on a
> public web site with sensitive data.
>
>
> Given how Drill acts today, I'd wager your best bet is to insert your own
> server between your user and Drill. Allow the user to specify queries in
> some simple, non-SQL way. Then, your server can build the SQL and forward
> it to Drill.
>
> Public Internet (Web Browser --> Secure Gateway) --> Private network (App
> server --> Drill)
>
>
> For example, if I want to know about "Orders", I can specify a date range.
> Your server fills in the storage plugin, table name, WHERE clause, etc. to
> create the SQL. Such an approach allows the Drill REST API is on a private
> IP address within your data center. Only your outward-facing user service
> is on a public IP. With this approach, there is no SQL injection risk
> because you do not directly use the web-provided info in a SQL statement.
> Of course, you have to build your SQL statement correctly, as you are
> doing. Don't just append web text to a SQL statement.
>
>
> I don't think this is unique to Drill. I'd be surprised if most people
> allow, say, public access to their HBase, Cassandra or MySQL DBs.
>
>
> Thanks,
> - Paul
>
>
>
>     On Sunday, May 10, 2020, 5:29:13 PM PDT, Avner Levy <
> avner.l...@gmail.com> wrote:
>
>  Hi Charles, Paul,
> Thanks for your answers.
> I'm interested in a case, where there is a Rest service which
> authenticate the service's users, get a request with the user parameters
> and build from it the SQL sent to Drill.
> The customers are identified by some account ID and they send for example
> the name of an entity they are looking for as a parameter in the service's
> REST request.
> Then the service can build the select (just an example):
> SELECT x FROM S3.db.`data/[CUSTOMER_ID]/data.parquet` where
> name='CUSTOMER_USER_INPUT]'
>
> In such case, they can still send in CUSTOMER_USER_INPUT the following: "x'
> union SELECT x FROM S3.db.`data/[OTHER_CUSTOMER_ID]/data.parquet`".
>
> Usually such stuff are solved with prepared statements, but I believe this
> isn't supported over REST.
> I would prefer not having to authenticate my end users to Drill since this
> creates more work and complexity.
> Is there a way to have prepared statements in Drill?
> Is it supported in other protocols? (JDBC/ODBC)
> Limiting the query folder outside the SQL would do the job as well.
> Any feedback is appreciated,
> Thanks,
>   Avner
>
>
> On Sun, May 10, 2020 at 5:57 PM Paul Rogers <par0...@yahoo.com.invalid>
> wrote:
>
> > Hi Charles,
> >
> > One of the changes I was looking at was allowing multiple SQL statements
> > per REST request to get around the lack of session. The idea would be to
> > issue a number of ALTER SESSION, CTTAS, USE and similar statements
> followed
> > by a single query that returns data.
> >
> >
> > A better solution is to enable session support for the REST API. We
> > discussed the challenges involved due the disconnected nature of HTTP
> > requests.
> >
> > Another good improvement would be a SQL command way to create configs,
> not
> > just JSON editing. That way it would be easier to automate creation of a
> > config. Also, it would be handy to be able to externalize configs so they
> > can be stored in locations other than ZK (or local disk, in embedded
> mode.)
> > For this use case, a query for user "X" would work against the "s3-X"
> > config would could be retrieved from an external system that knows the
> > mapping from user X to the S3 files visible to X, and the security tokens
> > to use for that user.
> >
> > The question for now, however, is how to do this with the code that
> exists
> > in Drill 1.17. I'm hoping someone has worked out a solution.
> >
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >    On Sunday, May 10, 2020, 1:05:50 PM PDT, Charles Givre <
> > cgi...@gmail.com> wrote:
> >
> >  Hi Avner, Paul,
> > I was reading this and wondering:
> >
> > 1.  Is it in fact true (I think it is) that Drill does not allow multiple
> > queries to be submitted in one REST request?  I seem to remember running
> > into that issue when I was trying to do some of the Superset work.
> > 2.  If a user is required to be authenticated to execute a query, would
> > that not prevent the possibility of a non-authenticated user executing
> > arbitrary queries against someone else's data?
> > 3.  I would definitely create separate data sources for each tenant, but
> I
> > don't know that it is necessary (or helpful) to create one for each
> query.
> >
> > I'd agree with Paul, that Drill's access model needs improvement and that
> > would be a good addition to the project.  We might be able to assist with
> > that if there's interest.
> > Best,
> > -- C
> >
> >
> > > On May 10, 2020, at 3:55 PM, Paul Rogers <par0...@yahoo.com.INVALID>
> > wrote:
> > >
> > > Hi Avner,
> > >
> > > Drill was designed for a system in which the user name maps to a
> > certificate on the underlying file system, and the file system provides
> > complete security. This model has not been extended to the cloud world.
> > >
> > > What you want is a way to authenticate your user, map the user to a
> > storage plugin config for only that client's files, then restrict that
> user
> > to only that config. Further, you'd want the config to obtain S3 keys
> from
> > a vault of some sort. If you have that, you'd not have to worry about SQL
> > injection since only an authorized user could muck with the SQL, and they
> > could only access their own data -- which they can presumably access
> anyway.
> > >
> > >
> > > At present, Drill has no out-of-the-box security model for this use
> > case; there is no mechanism to associate users with configs, or to
> > externalize S3 security keys. Such a system would be a worthwhile
> addition
> > to the project.
> > >
> > > I wonder, has anyone else found a workaround for this use case? Maybe
> > via Kerberos or some such?
> > >
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > >    On Sunday, May 10, 2020, 12:04:16 PM PDT, Avner Levy <
> > avner.l...@gmail.com> wrote:
> > >
> > > Hi,
> > > I'm trying to use Apache Drill as a database for providing SQL over S3
> > > parquet files.
> > > Drill is used for serving multi-tenant data for multiple customers.
> > > Since I need to build the SQL string using the REST API I'm vulnerable
> to
> > > SQL injection attacks.
> > > I do test all user input and close it between apostrophes and
> > > escape apostrophe in the user input by doubling it but I'm still
> > concerned
> > > about optional SQL attacks.
> > > Will adding a different data source (which points to a different folder
> > on
> > > S3) per tenant is something that will have impact on performance? (I
> > might
> > > have thousands of those)
> > > Does it make sense to create the data source on the fly before query?
> > > Is there another way to limit the sent SQL to a specific folder?
> > > Thanks,
> > >  Avner
> >
>

Reply via email to