Interesting, Wonder if it is related to the varchar issue Zelaine mentioned above, even with the specific columns specified the query plan shows a SELECT * being pushed to postgres, does the select not send down the specific columns?
I will create another table with only the columns I want and try again to see if it is in fact due to the varchar columns. Thanks On Fri, May 13, 2016 at 10:54 AM Stefan Sedich <stefan.sed...@gmail.com> wrote: > Jason. > > Ran the following: > > alter session set `store.format`='csv'; > create table dfs.tmp.foo as select * from my_large_table; > > > Same end result, chews memory until it heaps my heap size and eventually > hits the OOM, this table has a number of varchar columns but I did only > select a couple columns in my select, so was hoping it would avoid the > issue mentioned above with varchar columns, I will create some other test > tables later with only the values I need and see how that works out. > > > > Thanks > > On Fri, May 13, 2016 at 10:38 AM Jason Altekruse <ja...@dremio.com> wrote: > >> I am curious if this is a bug in the JDBC plugin. Can you try to change >> the >> output format to CSV? In that case we don't do any large buffering. >> >> Jason Altekruse >> Software Engineer at Dremio >> Apache Drill Committer >> >> On Fri, May 13, 2016 at 10:35 AM, Stefan Sedich <stefan.sed...@gmail.com> >> wrote: >> >> > Seems like it just ran out of memory again and was not hanging. I tried >> to >> > append a limit 100 to the select query and it still runs out of memory, >> > Just ran the CTAS against some other smaller tables and it works fine. >> > >> > I will play around with this some more on the weekend, I can only >> assume I >> > am messing something up here, I have in the past created parquet files >> from >> > large tables without any issue, will report back. >> > >> > >> > >> > Thanks >> > >> > On Fri, May 13, 2016 at 10:05 AM Abdel Hakim Deneche < >> > adene...@maprtech.com> >> > wrote: >> > >> > > Stefan, >> > > >> > > Can you share the query profile for the query that seems to be running >> > > forever ? you won't find it on disk but you can append .json to the >> > profile >> > > web url and save the file. >> > > >> > > Thanks >> > > >> > > On Fri, May 13, 2016 at 9:55 AM, Stefan Sedich < >> stefan.sed...@gmail.com> >> > > wrote: >> > > >> > > > Zelaine, >> > > > >> > > > It does, I forgot about those ones, I will do a test where I filter >> > those >> > > > out and see how I go, in my test with a 12GB heap size it seemed to >> > just >> > > > sit there forever and not finish. >> > > > >> > > > >> > > > Thanks >> > > > >> > > > On Fri, May 13, 2016 at 9:50 AM Zelaine Fong <zf...@maprtech.com> >> > wrote: >> > > > >> > > > > Stefan, >> > > > > >> > > > > Does your source data contain varchar columns? We've seen >> instances >> > > > where >> > > > > Drill isn't as efficient as it can be when Parquet is dealing with >> > > > variable >> > > > > length columns. >> > > > > >> > > > > -- Zelaine >> > > > > >> > > > > On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich < >> > > stefan.sed...@gmail.com> >> > > > > wrote: >> > > > > >> > > > > > Thanks for getting back to me so fast! >> > > > > > >> > > > > > I was just playing with that now, went up to 8GB and still ran >> into >> > > it, >> > > > > > trying to go higher to see if I can find the sweet spot, only >> got >> > > 16GB >> > > > > > total RAM on this laptop :) >> > > > > > >> > > > > > Is this an expected amount of memory for not an overly huge >> table >> > (16 >> > > > > > million rows, 6 columns of integers), even now at a 12GB heap >> seems >> > > to >> > > > > have >> > > > > > filled up again. >> > > > > > >> > > > > > >> > > > > > >> > > > > > Thanks >> > > > > > >> > > > > > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse < >> ja...@dremio.com> >> > > > > wrote: >> > > > > > >> > > > > > > I could not find anywhere this is mentioned in the docs, but >> it >> > has >> > > > > come >> > > > > > up >> > > > > > > a few times one the list. While we made a number of efforts to >> > move >> > > > our >> > > > > > > interactions with the Parquet library to the off-heap memory >> > (which >> > > > we >> > > > > > use >> > > > > > > everywhere else in the engine during processing) the version >> of >> > the >> > > > > > writer >> > > > > > > we are using still buffers a non-trivial amount of data into >> heap >> > > > > memory >> > > > > > > when writing parquet files. Try raising your JVM heap memory >> in >> > > > > > > drill-env.sh on startup and see if that prevents the out of >> > memory >> > > > > issue. >> > > > > > > >> > > > > > > Jason Altekruse >> > > > > > > Software Engineer at Dremio >> > > > > > > Apache Drill Committer >> > > > > > > >> > > > > > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich < >> > > > > stefan.sed...@gmail.com> >> > > > > > > wrote: >> > > > > > > >> > > > > > > > Just trying to do a CTAS on a postgres table, it is not huge >> > and >> > > > only >> > > > > > has >> > > > > > > > 16 odd million rows, I end up with an out of memory after a >> > > while. >> > > > > > > > >> > > > > > > > Unable to handle out of memory condition in >> FragmentExecutor. >> > > > > > > > >> > > > > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded >> > > > > > > > >> > > > > > > > >> > > > > > > > Is there a way to avoid this without needing to do the CTAS >> on >> > a >> > > > > subset >> > > > > > > of >> > > > > > > > my table? >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > >> > > Abdelhakim Deneche >> > > >> > > Software Engineer >> > > >> > > <http://www.mapr.com/> >> > > >> > > >> > > Now Available - Free Hadoop On-Demand Training >> > > < >> > > >> > >> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >> > > > >> > > >> > >> >