Re: Drill taking way too long to plan query

Tanmay Solanki Thu, 23 Jun 2016 14:35:06 -0700

yes tables/stats/iad/yyyymmdd is the structure and each of those directories is 
a day so when I run a query on tables/stats/iad/201604* I am running it on the 
whole month of april 2016. By refreshing metadata for tables/stats/iad, it is 
trying to do it for all of the data which goes back until january and is still 
ingesting new metric up until today. This is where the refresh command gets 
stuck since we are constantly ingesting it can't finish refreshing the metadata 
on the last day.


Doing it for each day one at a time doesn't seem to work either since if there 
isn't one in the top level directory it only looks at the metadata for the 
first day in the month and does not even bother with the rest of the days. for 
example counting the rows for all of April 2016 would only return the rows in 
April 1st 2016. 

    On Thursday, 23 June 2016 2:25 PM, Neeraja Rentachintala 
<nrentachint...@maprtech.com> wrote:
 

 What is the partition/directory structure of your data.
Is this by day?

On Thu, Jun 23, 2016 at 1:48 PM, Tanmay Solanki <tsolank...@yahoo.in.invalid
> wrote:

> Yeah, so I tried to cache the metadata for 1 day earlier and saw that that
> greatly improved the performance. Then I tried doing this for a full 1
> month of data but unfortunately that was not allowed. I had to cache run
> "refresh table metadata" on the full iad folder since it does it by
> directory and goes into each sub directory and caches the metadata there
> and goes back to the top level directory and caches the metadata there.
> However, since we are constantly ingesting metrics, it gets to the last day
> and is not able to refresh metadata there and seems to be stuck in a loop
> (I had it running for 15000 seconds before I killed it). Is there any way
> to fix that or get around that without changing the structure of my data?
>
>    On Thursday, 23 June 2016 1:42 PM, Neeraja Rentachintala <
> nrentachint...@maprtech.com> wrote:
>
>
>  You might want to enable metadata caching and see if it helps.
>
>  https://drill.apache.org/docs/optimizing-parquet-metadata-reading/
>
> On Thu, Jun 23, 2016 at 1:36 PM, Tanmay Solanki
> <tsolank...@yahoo.in.invalid
> > wrote:
>
> > Below is the plan. The amount of files is ~213000 files of parquet data.
> >
> > 0: jdbc:drill:> explain plan for select count(*) from
> > s3.`tables/stats/iad/201604*/`;
> > +------+------+
> > |
> >
> >
> >
> >
> >
> >
> >  text | json |
> > +------+------+
> > | 00-00    Screen
> > 00-01      Project(EXPR$0=[$0])
> > 00-02        Project(EXPR$0=[$0])
> > 00-03
> >
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@7cb1a0e8
> [columns
> > = null, isStarQuery = false, isSkipQuery = false]])
> >  | {
> >  "head" : {
> >    "version" : 1,
> >    "generator" : {
> >      "type" : "ExplainHandler",
> >      "info" : ""
> >    },
> >    "type" : "APACHE_DRILL_PHYSICAL",
> >    "options" : [ ],
> >    "queue" : 0,
> >    "resultMode" : "EXEC"
> >  },
> >  "graph" : [ {
> >    "pop" : "DirectGroupScan",
> >    "@id" : 3,
> >    "cost" : 20.0
> >  }, {
> >    "pop" : "project",
> >    "@id" : 2,
> >    "exprs" : [ {
> >      "ref" : "`EXPR$0`",
> >      "expr" : "`count`"
> >    } ],
> >    "child" : 3,
> >    "initialAllocation" : 1000000,
> >    "maxAllocation" : 10000000000,
> >    "cost" : 20.0
> >  }, {
> >    "pop" : "project",
> >    "@id" : 1,
> >    "exprs" : [ {
> >      "ref" : "`EXPR$0`",
> >      "expr" : "`EXPR$0`"
> >    } ],
> >    "child" : 2,
> >    "initialAllocation" : 1000000,
> >    "maxAllocation" : 10000000000,
> >    "cost" : 20.0
> >  }, {
> >    "pop" : "screen",
> >    "@id" : 0,
> >    "child" : 1,
> >    "initialAllocation" : 1000000,
> >    "maxAllocation" : 10000000000,
> >    "cost" : 20.0
> >  } ]
> > } |
> > +------+------+
> > 1 row selected (7493.869 seconds)
> > Additionally I have the drillbit.log for this query which I will post
> > below:
> > 2016-06-23 18:25:16,417 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:foreman]
> > INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id
> > 2893d673-3dad-dd21-d5e6-8ef28e0f81c9: explain plan for select count(*)
> from
> > s3.`tables/stats/iad/201604*/`
> > 2016-06-23 20:29:45,446 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:foreman]
> > INFO  o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata:
> Executed
> > 218474 out of 218474 using 16 threads. Time: 3474817ms total,
> 254.452884ms
> > avg, 50344ms max.
> > 2016-06-23 20:29:45,446 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:foreman]
> > INFO  o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata:
> Executed
> > 218474 out of 218474 using 16 threads. Earliest start: 431.101000 ?s,
> > Latest start: 3474340355.187000 ?s, Average start: 1753982685.665761 ?s .
> > 2016-06-23 20:30:10,211 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:frag:0:0]
> > INFO  o.a.d.e.w.fragment.FragmentExecutor -
> > 2893d673-3dad-dd21-d5e6-8ef28e0f81c9:0:0: State change requested
> > AWAITING_ALLOCATION --> RUNNING
> > 2016-06-23 20:30:10,211 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:frag:0:0]
> > INFO  o.a.d.e.w.f.FragmentStatusReporter -
> > 2893d673-3dad-dd21-d5e6-8ef28e0f81c9:0:0: State to report: RUNNING
> > 2016-06-23 20:30:10,226 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:frag:0:0]
> > INFO  o.a.d.e.w.fragment.FragmentExecutor -
> > 2893d673-3dad-dd21-d5e6-8ef28e0f81c9:0:0: State change requested RUNNING
> > --> FINISHED
> > 2016-06-23 20:30:10,226 [2893d673-3dad-dd21-d5e6-8ef28e0f81c9:frag:0:0]
> > INFO  o.a.d.e.w.f.FragmentStatusReporter -
> > 2893d673-3dad-dd21-d5e6-8ef28e0f81c9:0:0: State to report: FINISHED
> >
> >
> >
> >    On Thursday, 23 June 2016 11:22 AM, Ted Dunning <
> ted.dunn...@gmail.com>
> > wrote:
> >
> >
> >  Also, how many files?  What format?
> >
> > Being so slow is an anomaly.
> >
> >
> >
> >
> > On Thu, Jun 23, 2016 at 11:15 AM, Khurram Faraaz <kfar...@maprtech.com>
> > wrote:
> >
> > > Can you please share the query plan for that long running query here ?
> > >
> > > On Thu, Jun 23, 2016 at 11:40 PM, Tanmay Solanki <
> > > tsolank...@yahoo.in.invalid> wrote:
> > >
> > > > I am trying to run a query on Apache drill to simply count the number
> > of
> > > > rows in a table stored in parquet format in S3. I am running this on
> a
> > 20
> > > > node r3.8xlarge EC2 instance cluster and I have my direct memory set
> to
> > > > 80GB, heap memory set to 32GB and set the
> > > > planner.memory.max_memory_per_node to a very high value. However,
> > > counting
> > > > the rows in this table takes around 7662 seconds, or around 2 hours,
> > for
> > > > drill to finish the query on a 9.93TB, 56 billion rows, and 174
> column
> > > > dataset.It seems like, from the logs and the web console that query
> > > > planning itself is taking near 99% of the time and actual query
> > execution
> > > > is almost taking no time. I ran the same query on PrestoDB of a
> similar
> > > > setup (20 node r3.8xlarge) and found that it completed in 137 seconds
> > or
> > > > just over 2 minutes. Is there someting wrong with my configuration of
> > > drill
> > > > possibly or is this what is expected for drill.
> > > >
> > >
> >
> >
> >
> >
>
>
>
>

Re: Drill taking way too long to plan query

Reply via email to