Hi Nitin, I am executing the SQL query on a drillbit node using drill-conf . We have configured a 5 node drill cluster external to Amazon with 32GB RAM. From one of the nodes, we are using drill-conf utility to fire the SQL query.
One observation is had is select * from `xxx.tsv` select * from `xxx.tsv` where yyy = 'zzz' Both these queries are taking almost the same time for 1 GB data with 1000000 rows. So if the network for data transfer is the major time taking component compared with the query execution time, I think that the entire data is first transferred to drill cluster and then the query is executed on the drill cluster ? Regards, Projjwal On Mon, Feb 20, 2017 at 6:18 PM, Nitin Pawar <nitinpawar...@gmail.com> wrote: > how are you doing select * .. using drill UI or sqlline? > where are you running it from ? > is the drill hosted in aws or on your local machine? > > I think majority of the time is spent on displaying the result set instead > of querying the file if the drill server is on aws. > If the drill server is local then it might be your network which might take > a lot of time based on s3 bucket location and where your drill server is > > On Mon, Feb 20, 2017 at 5:37 PM, PROJJWAL SAHA <proj.s...@gmail.com> > wrote: > > > Hello all, > > > > I am using 1GB data in the form of .tsv file, stored in Amazon S3 using > > Drill 1.8. I am using default configurations of Drill using S3 storage > > plugin coming out of the box. The drill bits are configured on a 5 node > > cluster with 32GB RAM and 4VCPU. > > > > I see that select * from xxx; query takes 23 mins to fetch 1,040,000 > rows. > > > > Is this the expected behaviour ? > > I am looking for any quick tuning that can improve the performance or any > > other suggestions. > > > > Attaching is the JSON profile for this query. > > > > Regards, > > Projjwal > > > > > > -- > Nitin Pawar >