Re: Query Optimization

Khurram Faraaz Thu, 17 Aug 2017 20:10:36 -0700

Please share your SQL query and the query plan.

To get the query plan, execute EXPLAIN PLAN FOR <your-SQL-query>;



Thanks,

Khurram

________________________________
From: Divya Gehlot <divya.htco...@gmail.com>
Sent: Friday, August 18, 2017 7:15:18 AM
To: user@drill.apache.org
Subject: Re: Query Optimization

Hi ,
Yes its the same query its just the ran the metadata refresh command .
My understanding is metadata refresh command saves reading the metadata.
How about column values ... Why is it reading all the files after metedata
refresh ?
Partition helps to retrieve data faster .
Like in hive how it happens when you mention the partition column in where
condition
it just goes and read and improves the query performace .
In my query also I where conidtion has  partioning column it should go and
read those partitioned files right ?
Why is it taking more time ?
Does the Drill works in different way compare to hive ?


Thanks,
Divya

On 18 August 2017 at 07:37, Padma Penumarthy <ppenumar...@mapr.com> wrote:

> It might read all those files if some new data gets added after running
> refresh metadata cache.
> If everything is same before and after metadata refresh i.e. no
> new data added and query is exactly the same, then it should not do that.
> Also, check if you can partition in  a way that will not create so many
> files in the
> first place.
>
> Thanks,
> Padma
>
>
> > On Aug 16, 2017, at 10:54 PM, Divya Gehlot <divya.htco...@gmail.com>
> wrote:
> >
> > Hi,
> > Another observation is
> > My query had where conditions based on the partition values
> >
> > Total number of parquet files in directory  - 102290
> >> Before Metadata refresh - Its reading only 4 files
> >> After metadata refresh - its reading 102290 files
> >
> >
> > This is how the refresh metadata works I mean it scans each and every
> files
> > and get the results ?
> >
> > I dont  have access to logs now .
> >
> > Thanks,
> > Divya
> >
> > On 17 August 2017 at 13:48, Divya Gehlot <divya.htco...@gmail.com>
> wrote:
> >
> >> Hi,
> >> Another observation is
> >> My query had where conditions based on the partition values
> >> Before Metadata refresh - Its reading only 4 files
> >> After metadata refresh - its reading 102290 files
> >>
> >> Thanks,
> >> Divya
> >>
> >> On 17 August 2017 at 13:03, Padma Penumarthy <ppenumar...@mapr.com>
> wrote:
> >>
> >>> Does your query have partition filter ?
> >>> Execution time is increased most likely because partition pruning is
> not
> >>> happening.
> >>> Did you get a chance to look at the logs ?  That might give some clues.
> >>>
> >>> Thanks,
> >>> Padma
> >>>
> >>>
> >>>> On Aug 16, 2017, at 9:32 PM, Divya Gehlot <divya.htco...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hi,
> >>>> Even I am surprised .
> >>>> I am running Drill version 1.10  on MapR enterprise version.
> >>>> *Query *- Selecting all the columns on partitioned parquet table
> >>>>
> >>>> I observed few things from Query statistics :
> >>>>
> >>>> Value
> >>>>
> >>>> Before Refresh Metadata
> >>>>
> >>>> After Refresh Metadata
> >>>>
> >>>> Fragments
> >>>>
> >>>> 1
> >>>>
> >>>> 13
> >>>>
> >>>> DURATION
> >>>>
> >>>> 01 min 0.233 sec
> >>>>
> >>>> 18 min 0.744 sec
> >>>>
> >>>> PLANNING
> >>>>
> >>>> 59.818 sec
> >>>>
> >>>> 33.087 sec
> >>>>
> >>>> QUEUED
> >>>>
> >>>> Not Available
> >>>>
> >>>> Not Available
> >>>>
> >>>> EXECUTION
> >>>>
> >>>> 0.415 sec
> >>>>
> >>>> 17 min 27.657 sec
> >>>>
> >>>> The planning time is being reduced by approx 60% but the execution
> time
> >>>> increased  drastically.
> >>>> I would like to understand why the exceution time increases after the
> >>>> metadata refresh .
> >>>>
> >>>>
> >>>> Appreciate the help.
> >>>>
> >>>> Thanks,
> >>>> divya
> >>>>
> >>>>
> >>>> On 17 August 2017 at 11:54, Padma Penumarthy <ppenumar...@mapr.com>
> >>> wrote:
> >>>>
> >>>>> Refresh table metadata should  help reduce query planning time.
> >>>>> It is odd that it went up after you did refresh table metadata.
> >>>>> Did you check the logs to see what is happening ? You might have to
> >>>>> turn on some debugs if needed.
> >>>>> BTW, what version of Drill are you running ?
> >>>>>
> >>>>> Thanks,
> >>>>> Padma
> >>>>>
> >>>>>
> >>>>>> On Aug 16, 2017, at 8:15 PM, Divya Gehlot <divya.htco...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>> I have data in parquet file format .
> >>>>>> when I run the query the data and see the execution plan I could see
> >>>>>> following
> >>>>>> statistics
> >>>>>>
> >>>>>>> TOTAL FRAGMENTS: 1
> >>>>>>>> DURATION: 01 min 0.233 sec
> >>>>>>>> PLANNING: 59.818 sec
> >>>>>>>> QUEUED: Not Available
> >>>>>>>> EXECUTION: 0.415 sec
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> As its a paquet file format I tried enabling refresh meta data
> >>>>>> and run below command
> >>>>>> REFRESH TABLE METADATA <path to table> ;
> >>>>>> then run the same query again on the same table same data (no
> changes
> >>> in
> >>>>>> data)  and could find the statistics as show below :
> >>>>>>
> >>>>>> TOTAL FRAGMENTS: 13
> >>>>>>>> DURATION: 14 min 14.604 sec
> >>>>>>>> PLANNING: 33.087 sec
> >>>>>>>> QUEUED: Not Available
> >>>>>>>> EXECUTION: Not Available
> >>>>>>>
> >>>>>>>
> >>>>>> The query is still running .
> >>>>>>
> >>>>>> Can somebody help me  understand why the query taking so long once I
> >>>>> issue
> >>>>>> the refresh metadata command.
> >>>>>>
> >>>>>> Aprreciate the help !
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Divya
> >>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Re: Query Optimization

Reply via email to