One other thing ... have you tried tuning the planner.memory_limit parameter? Based on the earlier stack trace, you're hitting a memory limit during query planning. So, tuning this parameter should help that. The default is 256 MB.
-- Zelaine On Thu, Sep 1, 2016 at 11:21 AM, rahul challapalli < challapallira...@gmail.com> wrote: > While planning we use heap memory. 2GB of heap should be sufficient for > what you mentioned. This looks like a bug to me. Can you raise a jira for > the same? And it would be super helpful if you can also attach the data set > used. > > Rahul > > On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante <spacep...@gmail.com> > wrote: > > > Sure, > > This is what I remember: > > > > * Failure > > - embedded mode on my laptop > > - drill memory: 2Gb/4Gb (heap/direct) > > - cpu: 4cores (+hyperthreading) > > - `planner.width.max_per_node=6` > > > > * Success > > - AWS Cluster 2x c3.8xlarge > > - drill memory: 16Gb/32Gb > > - cpu: limited by kubernetes to 24cores > > - `planner.width.max_per_node=23` > > > > I'm very busy right now to test again, but I'll try to provide better > info > > as soon as I can. > > > > > > > > On Wed, Aug 31, 2016 at 05:38:53PM +0530, Khurram Faraaz wrote: > > > >> Can you please share the number of cores on the setup where the query > hung > >> as compared to the number of cores on the setup where the query went > >> through successfully. > >> And details of memory from the two scenarios. > >> > >> Thanks, > >> Khurram > >> > >> On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <spacep...@gmail.com> > >> wrote: > >> > >> For the record, I think this was just bad memory configuration after > all. > >>> I retested on bigger machines and everything seems to be working fine. > >>> > >>> > >>> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote: > >>> > >>> Oscar, can you please report a JIRA with the required steps to > reproduce > >>>> the OOM error. That way someone from the Drill team will take a look > and > >>>> investigate. > >>>> > >>>> For others interested here is the stack trace. > >>>> > >>>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c- > 378aaa4ce50e:foreman] > >>>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure > >>>> Occurred, > >>>> exiting. Information message: Unable to handle out of memory condition > >>>> in > >>>> Foreman. > >>>> java.lang.OutOfMemoryError: Java heap space > >>>> at java.util.Arrays.copyOfRange(Arrays.java:2694) > >>>> ~[na:1.7.0_111] > >>>> at java.lang.String.<init>(String.java:203) ~[na:1.7.0_111] > >>>> at java.lang.StringBuilder.toString(StringBuilder.java:405) > >>>> ~[na:1.7.0_111] > >>>> at org.apache.calcite.util.Util.newInternal(Util.java:785) > >>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED] > >>>> at > >>>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch( > >>>> VolcanoRuleCall.java:251) > >>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED] > >>>> at > >>>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp( > >>>> VolcanoPlanner.java:808) > >>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED] > >>>> at > >>>> org.apache.calcite.tools.Programs$RuleSetProgram.run( > Programs.java:303) > >>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED] > >>>> at > >>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler > >>>> .transform(DefaultSqlHandler.java:404) > >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>> at > >>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler > >>>> .transform(DefaultSqlHandler.java:343) > >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>> at > >>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler > >>>> .convertToDrel(DefaultSqlHandler.java:240) > >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>> at > >>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler > >>>> .convertToDrel(DefaultSqlHandler.java:290) > >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>> at > >>>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge > >>>> tPlan(ExplainHandler.java:61) > >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>> at > >>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri > >>>> llSqlWorker.java:94) > >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>> at > >>>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978) > >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman. > java: > >>>> 257) > >>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>> at > >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool > >>>> Executor.java:1145) > >>>> [na:1.7.0_111] > >>>> at > >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo > >>>> lExecutor.java:615) > >>>> [na:1.7.0_111] > >>>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111] > >>>> > >>>> Thanks, > >>>> Khurram > >>>> > >>>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <spacep...@gmail.com> > >>>> wrote: > >>>> > >>>> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias), > >>>> > >>>>> explain succeeds within ~30s. Enabling any of the other lines > triggers > >>>>> the > >>>>> failure. > >>>>> > >>>>> This is a log with the `upload_date` lines and `usage <> 'Test'` > >>>>> enabled: > >>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e > >>>>> > >>>>> The client times out around here (~1.5hours): > >>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022 > >>>>> b3c55e#file-drillbit-log-L178 > >>>>> > >>>>> And it still keeps running for a while until it dies (~2.5hours): > >>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022 > >>>>> b3c55e#file-drillbit-log-L178 > >>>>> > >>>>> The memory settings for this test were: > >>>>> > >>>>> DRILL_HEAP="4G" > >>>>> DRILL_MAX_DIRECT_MEMORY="8G" > >>>>> > >>>>> This is on a laptop with 16G and I should probably lower it, but it > >>>>> seems > >>>>> a bit excessive for such a small query. And I think I got the same > >>>>> results > >>>>> on a 2 node cluster with 8/16. I'm gonna try again on the cluster to > >>>>> make > >>>>> sure. > >>>>> > >>>>> Thanks, > >>>>> Oscar > >>>>> > >>>>> > >>>>> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote: > >>>>> > >>>>> You mentioned "*But if I uncomment the where clause then it runs for > a > >>>>> > >>>>>> couple of hours until it runs out of memory.*" > >>>>>> > >>>>>> Can you please share the OutOfMemory details from drillbit.log and > the > >>>>>> value of DRILL_MAX_DIRECT_MEMORY > >>>>>> > >>>>>> Can you also try to see what happens if you retain just this line > >>>>>> where > >>>>>> upload_date = '2016-08-01' in your where clause, can you check if > the > >>>>>> explain succeeds. > >>>>>> > >>>>>> Thanks, > >>>>>> Khurram > >>>>>> > >>>>>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <spacep...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>> Hi there, > >>>>>> > >>>>>> I've been stuck with this for a while and I'm not sure if I'm > running > >>>>>>> into > >>>>>>> a bug or I'm just doing something very wrong. > >>>>>>> > >>>>>>> I have this stripped-down version of my query: > >>>>>>> https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c805b > >>>>>>> > >>>>>>> The data is just a single file with one record (1.5K). > >>>>>>> > >>>>>>> Without changing anything, explain takes ~1sec on my machine. But > >>>>>>> if I > >>>>>>> uncomment the where clause then it runs for a couple of hours until > >>>>>>> it > >>>>>>> runs > >>>>>>> out of memory. > >>>>>>> > >>>>>>> Also if I uncomment the where clause *and* take out the join, then > it > >>>>>>> takes around 30s to plan. > >>>>>>> > >>>>>>> Any ideas? > >>>>>>> Thanks! > >>>>>>> > >>>>>>> >