Ah ... yes, you're right. I forgot that was off heap. -- Zelaine
On Thu, Sep 1, 2016 at 11:41 AM, Sudheesh Katkam <skat...@maprtech.com> wrote: > That setting is for off-heap memory. The earlier case hit heap memory > limit. > > > On Sep 1, 2016, at 11:36 AM, Zelaine Fong <zf...@maprtech.com> wrote: > > > > One other thing ... have you tried tuning the planner.memory_limit > > parameter? Based on the earlier stack trace, you're hitting a memory > limit > > during query planning. So, tuning this parameter should help that. The > > default is 256 MB. > > > > -- Zelaine > > > > On Thu, Sep 1, 2016 at 11:21 AM, rahul challapalli < > > challapallira...@gmail.com> wrote: > > > >> While planning we use heap memory. 2GB of heap should be sufficient for > >> what you mentioned. This looks like a bug to me. Can you raise a jira > for > >> the same? And it would be super helpful if you can also attach the data > set > >> used. > >> > >> Rahul > >> > >> On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante <spacep...@gmail.com> > >> wrote: > >> > >>> Sure, > >>> This is what I remember: > >>> > >>> * Failure > >>> - embedded mode on my laptop > >>> - drill memory: 2Gb/4Gb (heap/direct) > >>> - cpu: 4cores (+hyperthreading) > >>> - `planner.width.max_per_node=6` > >>> > >>> * Success > >>> - AWS Cluster 2x c3.8xlarge > >>> - drill memory: 16Gb/32Gb > >>> - cpu: limited by kubernetes to 24cores > >>> - `planner.width.max_per_node=23` > >>> > >>> I'm very busy right now to test again, but I'll try to provide better > >> info > >>> as soon as I can. > >>> > >>> > >>> > >>> On Wed, Aug 31, 2016 at 05:38:53PM +0530, Khurram Faraaz wrote: > >>> > >>>> Can you please share the number of cores on the setup where the query > >> hung > >>>> as compared to the number of cores on the setup where the query went > >>>> through successfully. > >>>> And details of memory from the two scenarios. > >>>> > >>>> Thanks, > >>>> Khurram > >>>> > >>>> On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <spacep...@gmail.com> > >>>> wrote: > >>>> > >>>> For the record, I think this was just bad memory configuration after > >> all. > >>>>> I retested on bigger machines and everything seems to be working > fine. > >>>>> > >>>>> > >>>>> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote: > >>>>> > >>>>> Oscar, can you please report a JIRA with the required steps to > >> reproduce > >>>>>> the OOM error. That way someone from the Drill team will take a look > >> and > >>>>>> investigate. > >>>>>> > >>>>>> For others interested here is the stack trace. > >>>>>> > >>>>>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c- > >> 378aaa4ce50e:foreman] > >>>>>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure > >>>>>> Occurred, > >>>>>> exiting. Information message: Unable to handle out of memory > condition > >>>>>> in > >>>>>> Foreman. > >>>>>> java.lang.OutOfMemoryError: Java heap space > >>>>>> at java.util.Arrays.copyOfRange(Arrays.java:2694) > >>>>>> ~[na:1.7.0_111] > >>>>>> at java.lang.String.<init>(String.java:203) ~[na:1.7.0_111] > >>>>>> at java.lang.StringBuilder.toString(StringBuilder.java:405) > >>>>>> ~[na:1.7.0_111] > >>>>>> at org.apache.calcite.util.Util.newInternal(Util.java:785) > >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED] > >>>>>> at > >>>>>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch( > >>>>>> VolcanoRuleCall.java:251) > >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED] > >>>>>> at > >>>>>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp( > >>>>>> VolcanoPlanner.java:808) > >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED] > >>>>>> at > >>>>>> org.apache.calcite.tools.Programs$RuleSetProgram.run( > >> Programs.java:303) > >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED] > >>>>>> at > >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler > >>>>>> .transform(DefaultSqlHandler.java:404) > >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>>>> at > >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler > >>>>>> .transform(DefaultSqlHandler.java:343) > >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>>>> at > >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler > >>>>>> .convertToDrel(DefaultSqlHandler.java:240) > >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>>>> at > >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler > >>>>>> .convertToDrel(DefaultSqlHandler.java:290) > >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>>>> at > >>>>>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge > >>>>>> tPlan(ExplainHandler.java:61) > >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>>>> at > >>>>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri > >>>>>> llSqlWorker.java:94) > >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>>>> at > >>>>>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978) > >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>>>> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman. > >> java: > >>>>>> 257) > >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > >>>>>> at > >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool > >>>>>> Executor.java:1145) > >>>>>> [na:1.7.0_111] > >>>>>> at > >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo > >>>>>> lExecutor.java:615) > >>>>>> [na:1.7.0_111] > >>>>>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111] > >>>>>> > >>>>>> Thanks, > >>>>>> Khurram > >>>>>> > >>>>>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <spacep...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias), > >>>>>> > >>>>>>> explain succeeds within ~30s. Enabling any of the other lines > >> triggers > >>>>>>> the > >>>>>>> failure. > >>>>>>> > >>>>>>> This is a log with the `upload_date` lines and `usage <> 'Test'` > >>>>>>> enabled: > >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e > >>>>>>> > >>>>>>> The client times out around here (~1.5hours): > >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022 > >>>>>>> b3c55e#file-drillbit-log-L178 > >>>>>>> > >>>>>>> And it still keeps running for a while until it dies (~2.5hours): > >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022 > >>>>>>> b3c55e#file-drillbit-log-L178 > >>>>>>> > >>>>>>> The memory settings for this test were: > >>>>>>> > >>>>>>> DRILL_HEAP="4G" > >>>>>>> DRILL_MAX_DIRECT_MEMORY="8G" > >>>>>>> > >>>>>>> This is on a laptop with 16G and I should probably lower it, but it > >>>>>>> seems > >>>>>>> a bit excessive for such a small query. And I think I got the same > >>>>>>> results > >>>>>>> on a 2 node cluster with 8/16. I'm gonna try again on the cluster > to > >>>>>>> make > >>>>>>> sure. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Oscar > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote: > >>>>>>> > >>>>>>> You mentioned "*But if I uncomment the where clause then it runs > for > >> a > >>>>>>> > >>>>>>>> couple of hours until it runs out of memory.*" > >>>>>>>> > >>>>>>>> Can you please share the OutOfMemory details from drillbit.log and > >> the > >>>>>>>> value of DRILL_MAX_DIRECT_MEMORY > >>>>>>>> > >>>>>>>> Can you also try to see what happens if you retain just this line > >>>>>>>> where > >>>>>>>> upload_date = '2016-08-01' in your where clause, can you check if > >> the > >>>>>>>> explain succeeds. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Khurram > >>>>>>>> > >>>>>>>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante < > spacep...@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>> Hi there, > >>>>>>>> > >>>>>>>> I've been stuck with this for a while and I'm not sure if I'm > >> running > >>>>>>>>> into > >>>>>>>>> a bug or I'm just doing something very wrong. > >>>>>>>>> > >>>>>>>>> I have this stripped-down version of my query: > >>>>>>>>> https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c80 > 5b > >>>>>>>>> > >>>>>>>>> The data is just a single file with one record (1.5K). > >>>>>>>>> > >>>>>>>>> Without changing anything, explain takes ~1sec on my machine. > But > >>>>>>>>> if I > >>>>>>>>> uncomment the where clause then it runs for a couple of hours > until > >>>>>>>>> it > >>>>>>>>> runs > >>>>>>>>> out of memory. > >>>>>>>>> > >>>>>>>>> Also if I uncomment the where clause *and* take out the join, > then > >> it > >>>>>>>>> takes around 30s to plan. > >>>>>>>>> > >>>>>>>>> Any ideas? > >>>>>>>>> Thanks! > >>>>>>>>> > >>>>>>>>> > >> > >