Hi Nitin,

I am executing the SQL query on a drillbit node using drill-conf .
 We have configured a 5 node drill cluster external to Amazon with 32GB
RAM. From one of the nodes, we are using drill-conf utility to fire the SQL
query.

One observation is had is
select * from `xxx.tsv`
select * from `xxx.tsv` where yyy = 'zzz'

Both these queries are taking almost the same time for 1 GB data with
1000000 rows. So if the network for data transfer is the major time taking
component compared with the query execution time,  I think that the entire
data is first transferred to drill cluster and then the query is executed
on the drill cluster ?

Regards,
Projjwal

On Mon, Feb 20, 2017 at 6:18 PM, Nitin Pawar <nitinpawar...@gmail.com>
wrote:

> how are you doing select * .. using drill UI or sqlline?
> where are you running it from ?
> is the drill hosted in aws or on your local machine?
>
> I think majority of the time is spent on displaying the result set instead
> of querying the file if the drill server is on aws.
> If the drill server is local then it might be your network which might take
> a lot of time based on s3 bucket location and where your drill server is
>
> On Mon, Feb 20, 2017 at 5:37 PM, PROJJWAL SAHA <proj.s...@gmail.com>
> wrote:
>
> > Hello all,
> >
> > I am using 1GB data in the form of .tsv file, stored in Amazon S3 using
> > Drill 1.8. I am using default configurations of Drill using S3 storage
> > plugin coming out of the box. The drill bits are configured on a 5 node
> > cluster with 32GB RAM and 4VCPU.
> >
> > I see that select * from xxx; query takes 23 mins to fetch 1,040,000
> rows.
> >
> > Is this the expected behaviour ?
> > I am looking for any quick tuning that can improve the performance or any
> > other suggestions.
> >
> > Attaching is the JSON profile for this query.
> >
> > Regards,
> > Projjwal
> >
>
>
>
> --
> Nitin Pawar
>

Reply via email to