Re: Using Drill with EMR

Daniil Osipov Mon, 02 Mar 2015 10:00:49 -0800

You can use Drill EMR Bootstrap script to launch a cluster with Drill
running:
https://github.com/awslabs/emr-bootstrap-actions/tree/master/drill


You'd still need to configure S3 access, some notes are here:
http://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/

On Fri, Feb 27, 2015 at 11:00 AM, Andries Engelbrecht <
[email protected]> wrote:

> 1. Yes. You can run Drill & ZK separate from Hadoop env. And it work on
> AWS.
>
> 2. I have not used it with Amazon EMR, maybe others can comment. Why are
> you looking at EMR and Drill vs spinning up instances with Drill & ZK?
> Drill does not work like Hive with underlying MR needed to execute queries,
> Drill is an execution engine itself.
>
> 3. Locality of where the Drill instances and S3 storage is will be key. It
> will be advisable to be in the same region and DC of AWS for both to get
> performance. I have not experimented enough with the JSON data file size,
> but you probably want to balance JSON file size to # of files for best
> behavior. Perhaps start with 32MB size JSON files and scale to 64/128 (and
> maybe 256MB) and see how it performs with S3.
>
> —Andries
>
>
>
> On Feb 25, 2015, at 11:17 AM, Mihai Stoicescu <[email protected]> wrote:
>
> > Hello,
> >
> > My name is Mihai Stoicescu and I am trying to experiment with  Apache
> > Drill.
> >
> > I have multiple questions that I hope you can help me find the answers:
> >
> >       1. Can Drill & Zookeper work outside Hadoop environment?
> >
> >        2. What would be the configuration steps I would need to make to
> > enable Drill with Amazon EMR?
> >
> >        3. If I want to keep the data inside S3 as JSON files, do you have
> > any recommendations in terms of setup and performance?
> >
> >
> >   Thank you,
> >
> > Mihai Stoicescu
>
>

Re: Using Drill with EMR

Reply via email to