Re: Query hangs on planning

Zelaine Fong Thu, 01 Sep 2016 12:04:03 -0700

Ah ... yes, you're right.  I forgot that was off heap.

-- Zelaine


On Thu, Sep 1, 2016 at 11:41 AM, Sudheesh Katkam <skat...@maprtech.com>
wrote:

> That setting is for off-heap memory. The earlier case hit heap memory
> limit.
>
> > On Sep 1, 2016, at 11:36 AM, Zelaine Fong <zf...@maprtech.com> wrote:
> >
> > One other thing ... have you tried tuning the planner.memory_limit
> > parameter?  Based on the earlier stack trace, you're hitting a memory
> limit
> > during query planning.  So, tuning this parameter should help that.  The
> > default is 256 MB.
> >
> > -- Zelaine
> >
> > On Thu, Sep 1, 2016 at 11:21 AM, rahul challapalli <
> > challapallira...@gmail.com> wrote:
> >
> >> While planning we use heap memory. 2GB of heap should be sufficient for
> >> what you mentioned. This looks like a bug to me. Can you raise a jira
> for
> >> the same? And it would be super helpful if you can also attach the data
> set
> >> used.
> >>
> >> Rahul
> >>
> >> On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante <spacep...@gmail.com>
> >> wrote:
> >>
> >>> Sure,
> >>> This is what I remember:
> >>>
> >>> * Failure
> >>>   - embedded mode on my laptop
> >>>   - drill memory: 2Gb/4Gb (heap/direct)
> >>>   - cpu: 4cores (+hyperthreading)
> >>>   - `planner.width.max_per_node=6`
> >>>
> >>> * Success
> >>>   - AWS Cluster 2x c3.8xlarge
> >>>   - drill memory: 16Gb/32Gb
> >>>   - cpu: limited by kubernetes to 24cores
> >>>   - `planner.width.max_per_node=23`
> >>>
> >>> I'm very busy right now to test again, but I'll try to provide better
> >> info
> >>> as soon as I can.
> >>>
> >>>
> >>>
> >>> On Wed, Aug 31, 2016 at 05:38:53PM +0530, Khurram Faraaz wrote:
> >>>
> >>>> Can you please share the number of cores on the setup where the query
> >> hung
> >>>> as compared to the number of cores on the setup where the query went
> >>>> through successfully.
> >>>> And details of memory from the two scenarios.
> >>>>
> >>>> Thanks,
> >>>> Khurram
> >>>>
> >>>> On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <spacep...@gmail.com>
> >>>> wrote:
> >>>>
> >>>> For the record, I think this was just bad memory configuration after
> >> all.
> >>>>> I retested on bigger machines and everything seems to be working
> fine.
> >>>>>
> >>>>>
> >>>>> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote:
> >>>>>
> >>>>> Oscar, can you please report a JIRA with the required steps to
> >> reproduce
> >>>>>> the OOM error. That way someone from the Drill team will take a look
> >> and
> >>>>>> investigate.
> >>>>>>
> >>>>>> For others interested here is the stack trace.
> >>>>>>
> >>>>>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-
> >> 378aaa4ce50e:foreman]
> >>>>>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure
> >>>>>> Occurred,
> >>>>>> exiting. Information message: Unable to handle out of memory
> condition
> >>>>>> in
> >>>>>> Foreman.
> >>>>>> java.lang.OutOfMemoryError: Java heap space
> >>>>>>       at java.util.Arrays.copyOfRange(Arrays.java:2694)
> >>>>>> ~[na:1.7.0_111]
> >>>>>>       at java.lang.String.<init>(String.java:203) ~[na:1.7.0_111]
> >>>>>>       at java.lang.StringBuilder.toString(StringBuilder.java:405)
> >>>>>> ~[na:1.7.0_111]
> >>>>>>       at org.apache.calcite.util.Util.newInternal(Util.java:785)
> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>>>       at
> >>>>>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
> >>>>>> VolcanoRuleCall.java:251)
> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>>>       at
> >>>>>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
> >>>>>> VolcanoPlanner.java:808)
> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>>>       at
> >>>>>> org.apache.calcite.tools.Programs$RuleSetProgram.run(
> >> Programs.java:303)
> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>>>> .transform(DefaultSqlHandler.java:404)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>>>> .transform(DefaultSqlHandler.java:343)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>>>> .convertToDrel(DefaultSqlHandler.java:240)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>>>> .convertToDrel(DefaultSqlHandler.java:290)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge
> >>>>>> tPlan(ExplainHandler.java:61)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri
> >>>>>> llSqlWorker.java:94)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.
> >> java:
> >>>>>> 257)
> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>>>       at
> >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >>>>>> Executor.java:1145)
> >>>>>> [na:1.7.0_111]
> >>>>>>       at
> >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >>>>>> lExecutor.java:615)
> >>>>>> [na:1.7.0_111]
> >>>>>>       at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Khurram
> >>>>>>
> >>>>>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <spacep...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias),
> >>>>>>
> >>>>>>> explain succeeds within ~30s.  Enabling any of the other lines
> >> triggers
> >>>>>>> the
> >>>>>>> failure.
> >>>>>>>
> >>>>>>> This is a log with the `upload_date` lines and `usage <> 'Test'`
> >>>>>>> enabled:
> >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e
> >>>>>>>
> >>>>>>> The client times out around here (~1.5hours):
> >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
> >>>>>>> b3c55e#file-drillbit-log-L178
> >>>>>>>
> >>>>>>> And it still keeps running for a while until it dies (~2.5hours):
> >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
> >>>>>>> b3c55e#file-drillbit-log-L178
> >>>>>>>
> >>>>>>> The memory settings for this test were:
> >>>>>>>
> >>>>>>>   DRILL_HEAP="4G"
> >>>>>>>   DRILL_MAX_DIRECT_MEMORY="8G"
> >>>>>>>
> >>>>>>> This is on a laptop with 16G and I should probably lower it, but it
> >>>>>>> seems
> >>>>>>> a bit excessive for such a small query.  And I think I got the same
> >>>>>>> results
> >>>>>>> on a 2 node cluster with 8/16.  I'm gonna try again on the cluster
> to
> >>>>>>> make
> >>>>>>> sure.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Oscar
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote:
> >>>>>>>
> >>>>>>> You mentioned "*But if I uncomment the where clause then it runs
> for
> >> a
> >>>>>>>
> >>>>>>>> couple of hours until it runs out of memory.*"
> >>>>>>>>
> >>>>>>>> Can you please share the OutOfMemory details from drillbit.log and
> >> the
> >>>>>>>> value of DRILL_MAX_DIRECT_MEMORY
> >>>>>>>>
> >>>>>>>> Can you also try to see what happens if you retain just this line
> >>>>>>>> where
> >>>>>>>> upload_date = '2016-08-01' in your where clause, can you check if
> >> the
> >>>>>>>> explain succeeds.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Khurram
> >>>>>>>>
> >>>>>>>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <
> spacep...@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi there,
> >>>>>>>>
> >>>>>>>> I've been stuck with this for a while and I'm not sure if I'm
> >> running
> >>>>>>>>> into
> >>>>>>>>> a bug or I'm just doing something very wrong.
> >>>>>>>>>
> >>>>>>>>> I have this stripped-down version of my query:
> >>>>>>>>> https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c80
> 5b
> >>>>>>>>>
> >>>>>>>>> The data is just a single file with one record (1.5K).
> >>>>>>>>>
> >>>>>>>>> Without changing anything, explain takes ~1sec on my machine.
> But
> >>>>>>>>> if I
> >>>>>>>>> uncomment the where clause then it runs for a couple of hours
> until
> >>>>>>>>> it
> >>>>>>>>> runs
> >>>>>>>>> out of memory.
> >>>>>>>>>
> >>>>>>>>> Also if I uncomment the where clause *and* take out the join,
> then
> >> it
> >>>>>>>>> takes around 30s to plan.
> >>>>>>>>>
> >>>>>>>>> Any ideas?
> >>>>>>>>> Thanks!
> >>>>>>>>>
> >>>>>>>>>
> >>
>
>

Re: Query hangs on planning

Reply via email to