To_date takes a long assumed to be a unix timestamp, so the error you are getting here is from an implicit cast trying to turn the string into a long before converting it to a date. You can provide a second parameter to tell it how you would like to parse your string to properly parse these kinds of dates.
https://drill.apache.org/docs/data-type-conversion/#to_date On Tue, Mar 1, 2016 at 9:17 AM, John Omernik <j...@omernik.com> wrote: > In the view I have select to_date(dir0) as sub_date... > > When I run a query, I am getting "Error: SYSTEM ERROR: > NumberFormatException: 2015-11-12" > > *even though I am using a where sub_date >= '2016-02-20' although I think > this has to do with the planning slowness I've spoken about > > > > On Tue, Mar 1, 2016 at 10:14 AM, Jacques Nadeau <jacq...@dremio.com> > wrote: > > > In the view. > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > > On Tue, Mar 1, 2016 at 6:02 AM, John Omernik <j...@omernik.com> wrote: > > > > > In the view or in the query? > > > > > > On Mon, Feb 29, 2016 at 9:05 PM, Jacques Nadeau <jacq...@dremio.com> > > > wrote: > > > > > > > Can you try to convert src_date to a date type? > > > > > > > > -- > > > > Jacques Nadeau > > > > CTO and Co-Founder, Dremio > > > > > > > > On Mon, Feb 29, 2016 at 10:28 AM, John Omernik <j...@omernik.com> > > wrote: > > > > > > > > > I am running 6 drill bits, they were running with 20GB of Direct > > Memory > > > > and > > > > > 4 GB of Heap, and I altered them to run with 18GB of direct and 6 > GB > > of > > > > > Heap, and I am still getting this error. > > > > > > > > > > I am running a query, and trying to understand why so much heap > space > > > is > > > > > being used. The data is Parquet files, organized into directories > by > > > date > > > > > (2015-01-01, 2015-01-02 etc) > > > > > > > > > > TABLE > > > > > ---> 2015-01-01 > > > > > ---> 2015-01-02 > > > > > > > > > > Etc > > > > > > > > > > This data isn't what I would call "huge", at most 500 MB per day, > > with > > > 69 > > > > > parquet files per day. While I do have the planning issue related > to > > > > lots > > > > > of directories with lots of files, (see other emails) I don't think > > > that > > > > is > > > > > related here. > > > > > > > > > > I have a view that basically select dir0 as src_date, field1, > field2, > > > > > field3 from table, then I run a query such as > > > > > > > > > > select src_date, count(1) from view_table where src_date >= > > > '2016-02-25' > > > > > group by src_date > > > > > > > > > > That will work. > > > > > > > > > > If I run > > > > > > > > > > select src_date, count(1) from view_table where src_date >= > > > '2016-02-01' > > > > > group by src_date > > > > > > > > > > That will hang, and eventually I will see drillbit crash and > restart > > > and > > > > > the errors logs point to Java Heap Space issues. This is the same > > on 4 > > > > GB > > > > > or 6 GB HEAP Space. > > > > > > > > > > So my question is this... > > > > > > > > > > Given the data, how do I troubleshoot this and provide helpful > > > feedback? > > > > I > > > > > am running the MapR 1.4 Developer Release right now, this to me > seems > > > to > > > > be > > > > > an issue in that why would a single query be able to crash a node? > > > > > SHouldn't the query be terminated? Even so, why would 30 days of > > 500mb > > > of > > > > > data (i.e. it would take 15 GB of direct ram per node, which is > > > > available, > > > > > to load the ENTIRE DATA set into ram) crash given that sort of > > > > aggregation? > > > > > > > > > > > > > > >