One other thing ... have you tried tuning the planner.memory_limit
parameter?  Based on the earlier stack trace, you're hitting a memory limit
during query planning.  So, tuning this parameter should help that.  The
default is 256 MB.

-- Zelaine

On Thu, Sep 1, 2016 at 11:21 AM, rahul challapalli <
challapallira...@gmail.com> wrote:

> While planning we use heap memory. 2GB of heap should be sufficient for
> what you mentioned. This looks like a bug to me. Can you raise a jira for
> the same? And it would be super helpful if you can also attach the data set
> used.
>
> Rahul
>
> On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante <spacep...@gmail.com>
> wrote:
>
> > Sure,
> > This is what I remember:
> >
> > * Failure
> >    - embedded mode on my laptop
> >    - drill memory: 2Gb/4Gb (heap/direct)
> >    - cpu: 4cores (+hyperthreading)
> >    - `planner.width.max_per_node=6`
> >
> > * Success
> >    - AWS Cluster 2x c3.8xlarge
> >    - drill memory: 16Gb/32Gb
> >    - cpu: limited by kubernetes to 24cores
> >    - `planner.width.max_per_node=23`
> >
> > I'm very busy right now to test again, but I'll try to provide better
> info
> > as soon as I can.
> >
> >
> >
> > On Wed, Aug 31, 2016 at 05:38:53PM +0530, Khurram Faraaz wrote:
> >
> >> Can you please share the number of cores on the setup where the query
> hung
> >> as compared to the number of cores on the setup where the query went
> >> through successfully.
> >> And details of memory from the two scenarios.
> >>
> >> Thanks,
> >> Khurram
> >>
> >> On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <spacep...@gmail.com>
> >> wrote:
> >>
> >> For the record, I think this was just bad memory configuration after
> all.
> >>> I retested on bigger machines and everything seems to be working fine.
> >>>
> >>>
> >>> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote:
> >>>
> >>> Oscar, can you please report a JIRA with the required steps to
> reproduce
> >>>> the OOM error. That way someone from the Drill team will take a look
> and
> >>>> investigate.
> >>>>
> >>>> For others interested here is the stack trace.
> >>>>
> >>>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-
> 378aaa4ce50e:foreman]
> >>>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure
> >>>> Occurred,
> >>>> exiting. Information message: Unable to handle out of memory condition
> >>>> in
> >>>> Foreman.
> >>>> java.lang.OutOfMemoryError: Java heap space
> >>>>        at java.util.Arrays.copyOfRange(Arrays.java:2694)
> >>>> ~[na:1.7.0_111]
> >>>>        at java.lang.String.<init>(String.java:203) ~[na:1.7.0_111]
> >>>>        at java.lang.StringBuilder.toString(StringBuilder.java:405)
> >>>> ~[na:1.7.0_111]
> >>>>        at org.apache.calcite.util.Util.newInternal(Util.java:785)
> >>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>        at
> >>>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
> >>>> VolcanoRuleCall.java:251)
> >>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>        at
> >>>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
> >>>> VolcanoPlanner.java:808)
> >>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>        at
> >>>> org.apache.calcite.tools.Programs$RuleSetProgram.run(
> Programs.java:303)
> >>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
> >>>>        at
> >>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>> .transform(DefaultSqlHandler.java:404)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>> .transform(DefaultSqlHandler.java:343)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>> .convertToDrel(DefaultSqlHandler.java:240)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> >>>> .convertToDrel(DefaultSqlHandler.java:290)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge
> >>>> tPlan(ExplainHandler.java:61)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri
> >>>> llSqlWorker.java:94)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.
> java:
> >>>> 257)
> >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> >>>>        at
> >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >>>> Executor.java:1145)
> >>>> [na:1.7.0_111]
> >>>>        at
> >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >>>> lExecutor.java:615)
> >>>> [na:1.7.0_111]
> >>>>        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> >>>>
> >>>> Thanks,
> >>>> Khurram
> >>>>
> >>>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <spacep...@gmail.com>
> >>>> wrote:
> >>>>
> >>>> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias),
> >>>>
> >>>>> explain succeeds within ~30s.  Enabling any of the other lines
> triggers
> >>>>> the
> >>>>> failure.
> >>>>>
> >>>>> This is a log with the `upload_date` lines and `usage <> 'Test'`
> >>>>> enabled:
> >>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e
> >>>>>
> >>>>> The client times out around here (~1.5hours):
> >>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
> >>>>> b3c55e#file-drillbit-log-L178
> >>>>>
> >>>>> And it still keeps running for a while until it dies (~2.5hours):
> >>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
> >>>>> b3c55e#file-drillbit-log-L178
> >>>>>
> >>>>> The memory settings for this test were:
> >>>>>
> >>>>>    DRILL_HEAP="4G"
> >>>>>    DRILL_MAX_DIRECT_MEMORY="8G"
> >>>>>
> >>>>> This is on a laptop with 16G and I should probably lower it, but it
> >>>>> seems
> >>>>> a bit excessive for such a small query.  And I think I got the same
> >>>>> results
> >>>>> on a 2 node cluster with 8/16.  I'm gonna try again on the cluster to
> >>>>> make
> >>>>> sure.
> >>>>>
> >>>>> Thanks,
> >>>>> Oscar
> >>>>>
> >>>>>
> >>>>> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote:
> >>>>>
> >>>>> You mentioned "*But if I uncomment the where clause then it runs for
> a
> >>>>>
> >>>>>> couple of hours until it runs out of memory.*"
> >>>>>>
> >>>>>> Can you please share the OutOfMemory details from drillbit.log and
> the
> >>>>>> value of DRILL_MAX_DIRECT_MEMORY
> >>>>>>
> >>>>>> Can you also try to see what happens if you retain just this line
> >>>>>> where
> >>>>>> upload_date = '2016-08-01' in your where clause, can you check if
> the
> >>>>>> explain succeeds.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Khurram
> >>>>>>
> >>>>>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <spacep...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi there,
> >>>>>>
> >>>>>> I've been stuck with this for a while and I'm not sure if I'm
> running
> >>>>>>> into
> >>>>>>> a bug or I'm just doing something very wrong.
> >>>>>>>
> >>>>>>> I have this stripped-down version of my query:
> >>>>>>> https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c805b
> >>>>>>>
> >>>>>>> The data is just a single file with one record (1.5K).
> >>>>>>>
> >>>>>>> Without changing anything, explain takes ~1sec on my machine.  But
> >>>>>>> if I
> >>>>>>> uncomment the where clause then it runs for a couple of hours until
> >>>>>>> it
> >>>>>>> runs
> >>>>>>> out of memory.
> >>>>>>>
> >>>>>>> Also if I uncomment the where clause *and* take out the join, then
> it
> >>>>>>> takes around 30s to plan.
> >>>>>>>
> >>>>>>> Any ideas?
> >>>>>>> Thanks!
> >>>>>>>
> >>>>>>>
>

Reply via email to