I am running 6 drill bits, they were running with 20GB of Direct Memory and
4 GB of Heap, and I altered them to run with 18GB of direct and 6 GB of
Heap, and I am still getting this error.

I am running a query, and trying to understand why so much heap space is
being used. The data is Parquet files, organized into directories by date
(2015-01-01, 2015-01-02 etc)

TABLE
---> 2015-01-01
---> 2015-01-02

Etc

This data isn't what I would call "huge", at most 500 MB per day, with 69
parquet files per day.  While I do have the planning issue related to lots
of directories with lots of files, (see other emails) I don't think that is
related here.

I have a view that basically select dir0 as src_date, field1, field2,
field3 from table, then I run a query such as

select src_date, count(1) from view_table where src_date >= '2016-02-25'
group by src_date

That will work.

If I run

select src_date, count(1) from view_table where src_date >= '2016-02-01'
group by src_date

That will hang, and eventually I will see drillbit crash and restart and
the errors logs point to Java Heap Space issues.  This is the same on 4 GB
or 6 GB HEAP Space.

So my question is this...

Given the data, how do I troubleshoot this and provide helpful feedback? I
am running the MapR 1.4 Developer Release right now, this to me seems to be
an issue in that why would a single query be able to crash a node?
SHouldn't the query be terminated? Even so, why would 30 days of 500mb of
data (i.e. it would take 15 GB of direct ram per node, which is available,
to load the ENTIRE DATA set into ram) crash given that sort of aggregation?

Reply via email to