In the view.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, Mar 1, 2016 at 6:02 AM, John Omernik <[email protected]> wrote:

> In the view or in the query?
>
> On Mon, Feb 29, 2016 at 9:05 PM, Jacques Nadeau <[email protected]>
> wrote:
>
> > Can you try to convert src_date to a date type?
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Feb 29, 2016 at 10:28 AM, John Omernik <[email protected]> wrote:
> >
> > > I am running 6 drill bits, they were running with 20GB of Direct Memory
> > and
> > > 4 GB of Heap, and I altered them to run with 18GB of direct and 6 GB of
> > > Heap, and I am still getting this error.
> > >
> > > I am running a query, and trying to understand why so much heap space
> is
> > > being used. The data is Parquet files, organized into directories by
> date
> > > (2015-01-01, 2015-01-02 etc)
> > >
> > > TABLE
> > > ---> 2015-01-01
> > > ---> 2015-01-02
> > >
> > > Etc
> > >
> > > This data isn't what I would call "huge", at most 500 MB per day, with
> 69
> > > parquet files per day.  While I do have the planning issue related to
> > lots
> > > of directories with lots of files, (see other emails) I don't think
> that
> > is
> > > related here.
> > >
> > > I have a view that basically select dir0 as src_date, field1, field2,
> > > field3 from table, then I run a query such as
> > >
> > > select src_date, count(1) from view_table where src_date >=
> '2016-02-25'
> > > group by src_date
> > >
> > > That will work.
> > >
> > > If I run
> > >
> > > select src_date, count(1) from view_table where src_date >=
> '2016-02-01'
> > > group by src_date
> > >
> > > That will hang, and eventually I will see drillbit crash and restart
> and
> > > the errors logs point to Java Heap Space issues.  This is the same on 4
> > GB
> > > or 6 GB HEAP Space.
> > >
> > > So my question is this...
> > >
> > > Given the data, how do I troubleshoot this and provide helpful
> feedback?
> > I
> > > am running the MapR 1.4 Developer Release right now, this to me seems
> to
> > be
> > > an issue in that why would a single query be able to crash a node?
> > > SHouldn't the query be terminated? Even so, why would 30 days of 500mb
> of
> > > data (i.e. it would take 15 GB of direct ram per node, which is
> > available,
> > > to load the ENTIRE DATA set into ram) crash given that sort of
> > aggregation?
> > >
> >
>

Reply via email to