Looking at the 1st two lines of the log shows that the bulk of time was lost before the query even went into the real planning stage of the query:
2017-03-07 06:27:28,074 [274166de-f543-3fa7-ef9e-8e9e87d5d6a0:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query id 274166de-f543-3fa7-ef9e-8e9e87d5d6a0: select columns[0] from dfs.root.`/scratch/localdisk/drill/testdata/Cust_1G_20_tsv` where columns[0] ='41' and columns[3] ='568' 2017-03-07 06:28:00,775 [274166de-f543-3fa7-ef9e-8e9e87d5d6a0:foreman] INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1 More than 30 secs is unaccounted for. Can you turn on the root logger to be at the debug level and retry the explain plan? Kunal Khatua ________________________________ From: rahul challapalli <challapallira...@gmail.com> Sent: Tuesday, March 7, 2017 5:24:43 AM To: user Subject: Re: Minimise query plan time for dfs plugin for local file system on tsv file I did not get a chance to review the log file. However the next thing I would try is to make your cluster a single node cluster first and then run the same explain plan query separately on each individual file. On Mar 7, 2017 5:09 AM, "PROJJWAL SAHA" <proj.s...@gmail.com> wrote: > Hi Rahul, > > thanks for your suggestions. However, I am still not able to see any > reduction in query planning time > by explicit column names, removing extract headers and using columns[index] > > As I said, I ran explain plan and its taking 30+ secs for me. > My data is 1 GB tsv split into 20 files of 5 MB each. > Each 5MB file has close to 50k records > Its a 5 node cluster, and width per node is 4 > Therefore, total number of minor fragments for one major fragment is 20 > I have copied the source directory in all the drillbit nodes > > can you tell me a reasonable time estimate which I can expect drill to > return result for query for the above described scenario. > Query is - select columns[0] from > dfs.root.`/scratch/localdisk/drill/testdata/Cust_1G_20_tsv` > where columns[0] ='41' and columns[3] ='568' > > attached is the json profile > and the drillbit.log > > I also have the tracing enabled. > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler > org.apache.drill.exec.work.foreman.Foreman > however i see the duration of various steps in the order of ms in the logs. > i am not sure where planning time of the order of 30 secs is consumed. > > Please help > > Regards, > Projjwal > > > > > > > > On Mon, Mar 6, 2017 at 11:23 PM, rahul challapalli < > challapallira...@gmail.com> wrote: > >> You can try the below things. For each of the below check the planning >> time >> individually >> >> 1. Run explain plan for a simple "select * from ` >> /scratch/localdisk/drill/testdata/Cust_1G_tsv`" >> 2. Replace the '*' in your query with explicit column names >> 3. Remove the extract header from your storage plugin configuration and >> from your data files? Rewrite your query to use, columns[0_based_index] >> instead of explicit column names >> >> Also how many columns do you have in your text files and what is the size >> of each file? Like gautam suggested, it would be good to take a look at >> drillbit.log file (from the foreman node where planning occurred) and the >> query profile as well. >> >> - Rahul >> >> On Mon, Mar 6, 2017 at 9:30 AM, Gautam Parai <gpa...@mapr.com> wrote: >> >> > Can you please provide the drillbit.log file? >> > >> > >> > Gautam >> > >> > ________________________________ >> > From: PROJJWAL SAHA <proj.s...@gmail.com> >> > Sent: Monday, March 6, 2017 1:45:38 AM >> > To: user@drill.apache.org >> > Subject: Fwd: Minimise query plan time for dfs plugin for local file >> > system on tsv file >> > >> > all, please help me in giving suggestions on what areas i can look into >> > why the query planning time is taking so long for files which are local >> to >> > the drill machines. I have the same directory structure copied on all >> the 5 >> > nodes of the cluster. I am accessing the source files using out of the >> box >> > dfs storage plugin. >> > >> > Query planning time is approx 30 secs >> > Query execution time is apprx 1.5 secs >> > >> > Regards, >> > Projjwal >> > >> > ---------- Forwarded message ---------- >> > From: PROJJWAL SAHA <proj.s...@gmail.com<mailto:proj.s...@gmail.com>> >> > Date: Fri, Mar 3, 2017 at 5:06 PM >> > Subject: Minimise query plan time for dfs plugin for local file system >> on >> > tsv file >> > To: user@drill.apache.org<mailto:user@drill.apache.org> >> > >> > >> > Hello all, >> > >> > I am quering select * from dfs.xxx where yyy (filter condition) >> > >> > I am using dfs storage plugin that comes out of the box from drill on a >> > 1GB file, local to the drill cluster. >> > The 1GB file is split into 10 files of 100 MB each. >> > As expected I see 11 minor and 2 major fagments. >> > The drill cluster is 5 nodes cluster with 4 cores, 32 GB each. >> > >> > One observation is that the query plan time is more than 30 seconds. I >> ran >> > the explain plan query to validate this. >> > The query execution time is 2 secs. >> > total time taken is 32secs >> > >> > I wanted to understand how can i minimise the query plan time. >> Suggestions >> > ? >> > Is the time taken described above expected ? >> > Attached is result from explain plan query >> > >> > Regards, >> > Projjwal >> > >> > >> > >> > >