Re: Type confusion and number formatting exceptions

Stefán Baxter Wed, 29 Jul 2015 00:07:55 -0700

Hi,

Would it not be possible to determine this on first-value or know if a
"unused default type" is being used and then change it once a value
presents it self?


Regards,
 -Stefán

On Wed, Jul 29, 2015 at 1:59 AM, Steven Phillips <sphill...@maprtech.com>
wrote:

> This issue is addressed in DRILL-3477.
>
> My proposed solution is to use VarBinary as the default type, as this has
> the lowest precedence in the rules for implicit cast.
>
> Making this change breaks some other things though, so I haven't been able
> to merge the fix yet.
>
> On Tue, Jul 28, 2015 at 2:23 PM, Parth Chandra <par...@apache.org> wrote:
>
> > Hi Stefan
> >   This is the same old issue: Drill does an initial scan to determine the
> > type of a field. In cases where Drill encounters nulls in the data it
> > defaults to using a Nullable Int as the type (not a good choice perhaps).
> >   This leads to all sorts of issues (most of which you're hitting).
> >   There is an effort to improve this (DRILL-3228) but it will be a while
> > before this work is completed.
> >
> >   In the meantime, I can only suggest a workaround : use as cast around
> > your columns -
> >
> >  select p.type, coalesce( cast(p.dimensions.dim_type as varchar(20)),
> > cast(p.dimensions.type as varchar(20))) dimensions_type, count(*) from
> > `test.json` as p where occurred_at > '2015-07-26' and p.type in
> > ('plan.item.added','plan.item.removed') group by p.type,
> > coalesce(cast(p.dimensions.dim_type as varchar(20)),
> cast(p.dimensions.type
> > as varchar(20)));
> >
> >
> >
> >
> >
> > On Mon, Jul 27, 2015 at 4:59 AM, Stefán Baxter <
> ste...@activitystream.com>
> > wrote:
> >
> > > Hi,
> > >
> > > It seems that null values can trigger a column to be treated as a
> numeric
> > > one, in expressions evaluation, regardless of content or other
> indicators
> > > and that fields in substructures can affect same-named-fields in parent
> > > structure.
> > > (1.2-SNAPSHOT, parquet files)
> > >
> > > I have JSON data that can be reduced to to this:
> > >
> > >    - {"occurred_at":"2015-07-26
> > >
> > >
> >
> 08:45:41.234","type":"plan.item.added","dimensions":{"type":null,"dim_type":"Unspecified","category":"Unspecified","sub_category":null}}
> > >    - {"occurred_at":"2015-07-26
> > >
> > >
> >
> 08:45:43.598","type":"plan.item.removed","dimensions":{"type":"Unspecified","dim_type":null,"category":"Unspecified","sub_category":null}}
> > >    - {"occurred_at":"2015-07-26
> > >    08:45:44.241","type":"plan.item.removed","dimensions":{"type":"To
> > >    See","category":"Nature","sub_category":"Waterfalls"}}
> > >
> > > * notice the discrepancy in the dimensions structure that the type
> field
> > is
> > > either called type or dim_type (slightly relevant for the rest of this
> > > case)
> > >
> > >
> > > *1. Query where dimensions are not involved*
> > >
> > > select p.type, count(*) from
> > > dfs.tmp.`/analytics/processed/<some-tenant>/events` as p where
> > occurred_at
> > > > '2015-07-26' and p.type in ('plan.item.added','plan.item.removed')
> > group
> > > by p.type;
> > > +--------------------+---------+
> > > |        type        | EXPR$1  |
> > > +--------------------+---------+
> > > | plan.item.removed  | 947     |
> > > | plan.item.added    | 40342   |
> > > +--------------------+---------+
> > > 2 rows selected (0.508 seconds)
> > >
> > >
> > > *2. Same query but involves dimension.type as well*
> > >
> > > select p.type, coalesce(p.dimensions.dim_type, p.dimensions.type)
> > > dimensions_type, count(*) from
> > > dfs.tmp.`/analytics/processed/<some-tenant>/events` as p where
> > occurred_at
> > > > '2015-07-26' and p.type in ('plan.item.added','plan.item.removed')
> > group
> > > by p.type, coalesce(p.dimensions.dim_type, p.dimensions.type);
> > >
> > > Error: SYSTEM ERROR: NumberFormatException: To See
> > > Fragment 2:0
> > > [Error Id: 4756f549-cc47-43e5-899e-10a11efb60ea on localhost:31010]
> > > (state=,code=0)
> > >
> > >
> > > I can provide test data if this is not enough to reproduce this bug.
> > >
> > > Regards,
> > >  -Stefán
> > >
> >
>
>
>
> --
>  Steven Phillips
>  Software Engineer
>
>  mapr.com
>

Re: Type confusion and number formatting exceptions

Reply via email to