Jason. Ran the following:
alter session set `store.format`='csv'; create table dfs.tmp.foo as select * from my_large_table; Same end result, chews memory until it heaps my heap size and eventually hits the OOM, this table has a number of varchar columns but I did only select a couple columns in my select, so was hoping it would avoid the issue mentioned above with varchar columns, I will create some other test tables later with only the values I need and see how that works out. Thanks On Fri, May 13, 2016 at 10:38 AM Jason Altekruse <ja...@dremio.com> wrote: > I am curious if this is a bug in the JDBC plugin. Can you try to change the > output format to CSV? In that case we don't do any large buffering. > > Jason Altekruse > Software Engineer at Dremio > Apache Drill Committer > > On Fri, May 13, 2016 at 10:35 AM, Stefan Sedich <stefan.sed...@gmail.com> > wrote: > > > Seems like it just ran out of memory again and was not hanging. I tried > to > > append a limit 100 to the select query and it still runs out of memory, > > Just ran the CTAS against some other smaller tables and it works fine. > > > > I will play around with this some more on the weekend, I can only assume > I > > am messing something up here, I have in the past created parquet files > from > > large tables without any issue, will report back. > > > > > > > > Thanks > > > > On Fri, May 13, 2016 at 10:05 AM Abdel Hakim Deneche < > > adene...@maprtech.com> > > wrote: > > > > > Stefan, > > > > > > Can you share the query profile for the query that seems to be running > > > forever ? you won't find it on disk but you can append .json to the > > profile > > > web url and save the file. > > > > > > Thanks > > > > > > On Fri, May 13, 2016 at 9:55 AM, Stefan Sedich < > stefan.sed...@gmail.com> > > > wrote: > > > > > > > Zelaine, > > > > > > > > It does, I forgot about those ones, I will do a test where I filter > > those > > > > out and see how I go, in my test with a 12GB heap size it seemed to > > just > > > > sit there forever and not finish. > > > > > > > > > > > > Thanks > > > > > > > > On Fri, May 13, 2016 at 9:50 AM Zelaine Fong <zf...@maprtech.com> > > wrote: > > > > > > > > > Stefan, > > > > > > > > > > Does your source data contain varchar columns? We've seen > instances > > > > where > > > > > Drill isn't as efficient as it can be when Parquet is dealing with > > > > variable > > > > > length columns. > > > > > > > > > > -- Zelaine > > > > > > > > > > On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich < > > > stefan.sed...@gmail.com> > > > > > wrote: > > > > > > > > > > > Thanks for getting back to me so fast! > > > > > > > > > > > > I was just playing with that now, went up to 8GB and still ran > into > > > it, > > > > > > trying to go higher to see if I can find the sweet spot, only got > > > 16GB > > > > > > total RAM on this laptop :) > > > > > > > > > > > > Is this an expected amount of memory for not an overly huge table > > (16 > > > > > > million rows, 6 columns of integers), even now at a 12GB heap > seems > > > to > > > > > have > > > > > > filled up again. > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse < > ja...@dremio.com> > > > > > wrote: > > > > > > > > > > > > > I could not find anywhere this is mentioned in the docs, but it > > has > > > > > come > > > > > > up > > > > > > > a few times one the list. While we made a number of efforts to > > move > > > > our > > > > > > > interactions with the Parquet library to the off-heap memory > > (which > > > > we > > > > > > use > > > > > > > everywhere else in the engine during processing) the version of > > the > > > > > > writer > > > > > > > we are using still buffers a non-trivial amount of data into > heap > > > > > memory > > > > > > > when writing parquet files. Try raising your JVM heap memory in > > > > > > > drill-env.sh on startup and see if that prevents the out of > > memory > > > > > issue. > > > > > > > > > > > > > > Jason Altekruse > > > > > > > Software Engineer at Dremio > > > > > > > Apache Drill Committer > > > > > > > > > > > > > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich < > > > > > stefan.sed...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Just trying to do a CTAS on a postgres table, it is not huge > > and > > > > only > > > > > > has > > > > > > > > 16 odd million rows, I end up with an out of memory after a > > > while. > > > > > > > > > > > > > > > > Unable to handle out of memory condition in FragmentExecutor. > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > > > > > > > > > > > > > > > > > > > > > > > Is there a way to avoid this without needing to do the CTAS > on > > a > > > > > subset > > > > > > > of > > > > > > > > my table? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Abdelhakim Deneche > > > > > > Software Engineer > > > > > > <http://www.mapr.com/> > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > < > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > >