Please share your SQL query and the query plan. To get the query plan, execute EXPLAIN PLAN FOR <your-SQL-query>;
Thanks, Khurram ________________________________ From: Divya Gehlot <divya.htco...@gmail.com> Sent: Friday, August 18, 2017 7:15:18 AM To: user@drill.apache.org Subject: Re: Query Optimization Hi , Yes its the same query its just the ran the metadata refresh command . My understanding is metadata refresh command saves reading the metadata. How about column values ... Why is it reading all the files after metedata refresh ? Partition helps to retrieve data faster . Like in hive how it happens when you mention the partition column in where condition it just goes and read and improves the query performace . In my query also I where conidtion has partioning column it should go and read those partitioned files right ? Why is it taking more time ? Does the Drill works in different way compare to hive ? Thanks, Divya On 18 August 2017 at 07:37, Padma Penumarthy <ppenumar...@mapr.com> wrote: > It might read all those files if some new data gets added after running > refresh metadata cache. > If everything is same before and after metadata refresh i.e. no > new data added and query is exactly the same, then it should not do that. > Also, check if you can partition in a way that will not create so many > files in the > first place. > > Thanks, > Padma > > > > On Aug 16, 2017, at 10:54 PM, Divya Gehlot <divya.htco...@gmail.com> > wrote: > > > > Hi, > > Another observation is > > My query had where conditions based on the partition values > > > > Total number of parquet files in directory - 102290 > >> Before Metadata refresh - Its reading only 4 files > >> After metadata refresh - its reading 102290 files > > > > > > This is how the refresh metadata works I mean it scans each and every > files > > and get the results ? > > > > I dont have access to logs now . > > > > Thanks, > > Divya > > > > On 17 August 2017 at 13:48, Divya Gehlot <divya.htco...@gmail.com> > wrote: > > > >> Hi, > >> Another observation is > >> My query had where conditions based on the partition values > >> Before Metadata refresh - Its reading only 4 files > >> After metadata refresh - its reading 102290 files > >> > >> Thanks, > >> Divya > >> > >> On 17 August 2017 at 13:03, Padma Penumarthy <ppenumar...@mapr.com> > wrote: > >> > >>> Does your query have partition filter ? > >>> Execution time is increased most likely because partition pruning is > not > >>> happening. > >>> Did you get a chance to look at the logs ? That might give some clues. > >>> > >>> Thanks, > >>> Padma > >>> > >>> > >>>> On Aug 16, 2017, at 9:32 PM, Divya Gehlot <divya.htco...@gmail.com> > >>> wrote: > >>>> > >>>> Hi, > >>>> Even I am surprised . > >>>> I am running Drill version 1.10 on MapR enterprise version. > >>>> *Query *- Selecting all the columns on partitioned parquet table > >>>> > >>>> I observed few things from Query statistics : > >>>> > >>>> Value > >>>> > >>>> Before Refresh Metadata > >>>> > >>>> After Refresh Metadata > >>>> > >>>> Fragments > >>>> > >>>> 1 > >>>> > >>>> 13 > >>>> > >>>> DURATION > >>>> > >>>> 01 min 0.233 sec > >>>> > >>>> 18 min 0.744 sec > >>>> > >>>> PLANNING > >>>> > >>>> 59.818 sec > >>>> > >>>> 33.087 sec > >>>> > >>>> QUEUED > >>>> > >>>> Not Available > >>>> > >>>> Not Available > >>>> > >>>> EXECUTION > >>>> > >>>> 0.415 sec > >>>> > >>>> 17 min 27.657 sec > >>>> > >>>> The planning time is being reduced by approx 60% but the execution > time > >>>> increased drastically. > >>>> I would like to understand why the exceution time increases after the > >>>> metadata refresh . > >>>> > >>>> > >>>> Appreciate the help. > >>>> > >>>> Thanks, > >>>> divya > >>>> > >>>> > >>>> On 17 August 2017 at 11:54, Padma Penumarthy <ppenumar...@mapr.com> > >>> wrote: > >>>> > >>>>> Refresh table metadata should help reduce query planning time. > >>>>> It is odd that it went up after you did refresh table metadata. > >>>>> Did you check the logs to see what is happening ? You might have to > >>>>> turn on some debugs if needed. > >>>>> BTW, what version of Drill are you running ? > >>>>> > >>>>> Thanks, > >>>>> Padma > >>>>> > >>>>> > >>>>>> On Aug 16, 2017, at 8:15 PM, Divya Gehlot <divya.htco...@gmail.com> > >>>>> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> I have data in parquet file format . > >>>>>> when I run the query the data and see the execution plan I could see > >>>>>> following > >>>>>> statistics > >>>>>> > >>>>>>> TOTAL FRAGMENTS: 1 > >>>>>>>> DURATION: 01 min 0.233 sec > >>>>>>>> PLANNING: 59.818 sec > >>>>>>>> QUEUED: Not Available > >>>>>>>> EXECUTION: 0.415 sec > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> As its a paquet file format I tried enabling refresh meta data > >>>>>> and run below command > >>>>>> REFRESH TABLE METADATA <path to table> ; > >>>>>> then run the same query again on the same table same data (no > changes > >>> in > >>>>>> data) and could find the statistics as show below : > >>>>>> > >>>>>> TOTAL FRAGMENTS: 13 > >>>>>>>> DURATION: 14 min 14.604 sec > >>>>>>>> PLANNING: 33.087 sec > >>>>>>>> QUEUED: Not Available > >>>>>>>> EXECUTION: Not Available > >>>>>>> > >>>>>>> > >>>>>> The query is still running . > >>>>>> > >>>>>> Can somebody help me understand why the query taking so long once I > >>>>> issue > >>>>>> the refresh metadata command. > >>>>>> > >>>>>> Aprreciate the help ! > >>>>>> > >>>>>> Thanks, > >>>>>> Divya > >>>>> > >>>>> > >>> > >>> > >> > >