I am running 6 drill bits, they were running with 20GB of Direct Memory and 4 GB of Heap, and I altered them to run with 18GB of direct and 6 GB of Heap, and I am still getting this error.
I am running a query, and trying to understand why so much heap space is being used. The data is Parquet files, organized into directories by date (2015-01-01, 2015-01-02 etc) TABLE ---> 2015-01-01 ---> 2015-01-02 Etc This data isn't what I would call "huge", at most 500 MB per day, with 69 parquet files per day. While I do have the planning issue related to lots of directories with lots of files, (see other emails) I don't think that is related here. I have a view that basically select dir0 as src_date, field1, field2, field3 from table, then I run a query such as select src_date, count(1) from view_table where src_date >= '2016-02-25' group by src_date That will work. If I run select src_date, count(1) from view_table where src_date >= '2016-02-01' group by src_date That will hang, and eventually I will see drillbit crash and restart and the errors logs point to Java Heap Space issues. This is the same on 4 GB or 6 GB HEAP Space. So my question is this... Given the data, how do I troubleshoot this and provide helpful feedback? I am running the MapR 1.4 Developer Release right now, this to me seems to be an issue in that why would a single query be able to crash a node? SHouldn't the query be terminated? Even so, why would 30 days of 500mb of data (i.e. it would take 15 GB of direct ram per node, which is available, to load the ENTIRE DATA set into ram) crash given that sort of aggregation?
