Re: CTAS Out of Memory

Stefan Sedich Fri, 13 May 2016 10:55:34 -0700

Jason.

Ran the following:


alter session set `store.format`='csv';
create table dfs.tmp.foo as select * from my_large_table;


Same end result, chews memory until it heaps my heap size and eventually
hits the OOM, this table has a number of varchar columns but I did only
select a couple columns in my select, so was hoping it would avoid the
issue mentioned above with varchar columns, I will create some other test
tables later with only the values I need and see how that works out.



Thanks

On Fri, May 13, 2016 at 10:38 AM Jason Altekruse <ja...@dremio.com> wrote:

> I am curious if this is a bug in the JDBC plugin. Can you try to change the
> output format to CSV? In that case we don't do any large buffering.
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>
> On Fri, May 13, 2016 at 10:35 AM, Stefan Sedich <stefan.sed...@gmail.com>
> wrote:
>
> > Seems like it just ran out of memory again and was not hanging. I tried
> to
> > append a limit 100 to the select query and it still runs out of memory,
> > Just ran the CTAS against some other smaller tables and it works fine.
> >
> > I will play around with this some more on the weekend, I can only assume
> I
> > am messing something up here, I have in the past created parquet files
> from
> > large tables without any issue, will report back.
> >
> >
> >
> > Thanks
> >
> > On Fri, May 13, 2016 at 10:05 AM Abdel Hakim Deneche <
> > adene...@maprtech.com>
> > wrote:
> >
> > > Stefan,
> > >
> > > Can you share the query profile for the query that seems to be running
> > > forever ? you won't find it on disk but you can append .json to the
> > profile
> > > web url and save the file.
> > >
> > > Thanks
> > >
> > > On Fri, May 13, 2016 at 9:55 AM, Stefan Sedich <
> stefan.sed...@gmail.com>
> > > wrote:
> > >
> > > > Zelaine,
> > > >
> > > > It does, I forgot about those ones, I will do a test where I filter
> > those
> > > > out and see how I go, in my test with a 12GB heap size it seemed to
> > just
> > > > sit there forever and not finish.
> > > >
> > > >
> > > > Thanks
> > > >
> > > > On Fri, May 13, 2016 at 9:50 AM Zelaine Fong <zf...@maprtech.com>
> > wrote:
> > > >
> > > > > Stefan,
> > > > >
> > > > > Does your source data contain varchar columns?  We've seen
> instances
> > > > where
> > > > > Drill isn't as efficient as it can be when Parquet is dealing with
> > > > variable
> > > > > length columns.
> > > > >
> > > > > -- Zelaine
> > > > >
> > > > > On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich <
> > > stefan.sed...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks for getting back to me so fast!
> > > > > >
> > > > > > I was just playing with that now, went up to 8GB and still ran
> into
> > > it,
> > > > > > trying to go higher to see if I can find the sweet spot, only got
> > > 16GB
> > > > > > total RAM on this laptop :)
> > > > > >
> > > > > > Is this an expected amount of memory for not an overly huge table
> > (16
> > > > > > million rows, 6 columns of integers), even now at a 12GB heap
> seems
> > > to
> > > > > have
> > > > > > filled up again.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse <
> ja...@dremio.com>
> > > > > wrote:
> > > > > >
> > > > > > > I could not find anywhere this is mentioned in the docs, but it
> > has
> > > > > come
> > > > > > up
> > > > > > > a few times one the list. While we made a number of efforts to
> > move
> > > > our
> > > > > > > interactions with the Parquet library to the off-heap memory
> > (which
> > > > we
> > > > > > use
> > > > > > > everywhere else in the engine during processing) the version of
> > the
> > > > > > writer
> > > > > > > we are using still buffers a non-trivial amount of data into
> heap
> > > > > memory
> > > > > > > when writing parquet files. Try raising your JVM heap memory in
> > > > > > > drill-env.sh on startup and see if that prevents the out of
> > memory
> > > > > issue.
> > > > > > >
> > > > > > > Jason Altekruse
> > > > > > > Software Engineer at Dremio
> > > > > > > Apache Drill Committer
> > > > > > >
> > > > > > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <
> > > > > stefan.sed...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Just trying to do a CTAS on a postgres table, it is not huge
> > and
> > > > only
> > > > > > has
> > > > > > > > 16 odd million rows, I end up with an out of memory after a
> > > while.
> > > > > > > >
> > > > > > > > Unable to handle out of memory condition in FragmentExecutor.
> > > > > > > >
> > > > > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > > > > > >
> > > > > > > >
> > > > > > > > Is there a way to avoid this without needing to do the CTAS
> on
> > a
> > > > > subset
> > > > > > > of
> > > > > > > > my table?
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>

Re: CTAS Out of Memory

Reply via email to