You can use Drill EMR Bootstrap script to launch a cluster with Drill running: https://github.com/awslabs/emr-bootstrap-actions/tree/master/drill
You'd still need to configure S3 access, some notes are here: http://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/ On Fri, Feb 27, 2015 at 11:00 AM, Andries Engelbrecht < [email protected]> wrote: > 1. Yes. You can run Drill & ZK separate from Hadoop env. And it work on > AWS. > > 2. I have not used it with Amazon EMR, maybe others can comment. Why are > you looking at EMR and Drill vs spinning up instances with Drill & ZK? > Drill does not work like Hive with underlying MR needed to execute queries, > Drill is an execution engine itself. > > 3. Locality of where the Drill instances and S3 storage is will be key. It > will be advisable to be in the same region and DC of AWS for both to get > performance. I have not experimented enough with the JSON data file size, > but you probably want to balance JSON file size to # of files for best > behavior. Perhaps start with 32MB size JSON files and scale to 64/128 (and > maybe 256MB) and see how it performs with S3. > > —Andries > > > > On Feb 25, 2015, at 11:17 AM, Mihai Stoicescu <[email protected]> wrote: > > > Hello, > > > > My name is Mihai Stoicescu and I am trying to experiment with Apache > > Drill. > > > > I have multiple questions that I hope you can help me find the answers: > > > > 1. Can Drill & Zookeper work outside Hadoop environment? > > > > 2. What would be the configuration steps I would need to make to > > enable Drill with Amazon EMR? > > > > 3. If I want to keep the data inside S3 as JSON files, do you have > > any recommendations in terms of setup and performance? > > > > > > Thank you, > > > > Mihai Stoicescu > >
