Hi, Would it not be possible to determine this on first-value or know if a "unused default type" is being used and then change it once a value presents it self?
Regards, -Stefán On Wed, Jul 29, 2015 at 1:59 AM, Steven Phillips <sphill...@maprtech.com> wrote: > This issue is addressed in DRILL-3477. > > My proposed solution is to use VarBinary as the default type, as this has > the lowest precedence in the rules for implicit cast. > > Making this change breaks some other things though, so I haven't been able > to merge the fix yet. > > On Tue, Jul 28, 2015 at 2:23 PM, Parth Chandra <par...@apache.org> wrote: > > > Hi Stefan > > This is the same old issue: Drill does an initial scan to determine the > > type of a field. In cases where Drill encounters nulls in the data it > > defaults to using a Nullable Int as the type (not a good choice perhaps). > > This leads to all sorts of issues (most of which you're hitting). > > There is an effort to improve this (DRILL-3228) but it will be a while > > before this work is completed. > > > > In the meantime, I can only suggest a workaround : use as cast around > > your columns - > > > > select p.type, coalesce( cast(p.dimensions.dim_type as varchar(20)), > > cast(p.dimensions.type as varchar(20))) dimensions_type, count(*) from > > `test.json` as p where occurred_at > '2015-07-26' and p.type in > > ('plan.item.added','plan.item.removed') group by p.type, > > coalesce(cast(p.dimensions.dim_type as varchar(20)), > cast(p.dimensions.type > > as varchar(20))); > > > > > > > > > > > > On Mon, Jul 27, 2015 at 4:59 AM, Stefán Baxter < > ste...@activitystream.com> > > wrote: > > > > > Hi, > > > > > > It seems that null values can trigger a column to be treated as a > numeric > > > one, in expressions evaluation, regardless of content or other > indicators > > > and that fields in substructures can affect same-named-fields in parent > > > structure. > > > (1.2-SNAPSHOT, parquet files) > > > > > > I have JSON data that can be reduced to to this: > > > > > > - {"occurred_at":"2015-07-26 > > > > > > > > > 08:45:41.234","type":"plan.item.added","dimensions":{"type":null,"dim_type":"Unspecified","category":"Unspecified","sub_category":null}} > > > - {"occurred_at":"2015-07-26 > > > > > > > > > 08:45:43.598","type":"plan.item.removed","dimensions":{"type":"Unspecified","dim_type":null,"category":"Unspecified","sub_category":null}} > > > - {"occurred_at":"2015-07-26 > > > 08:45:44.241","type":"plan.item.removed","dimensions":{"type":"To > > > See","category":"Nature","sub_category":"Waterfalls"}} > > > > > > * notice the discrepancy in the dimensions structure that the type > field > > is > > > either called type or dim_type (slightly relevant for the rest of this > > > case) > > > > > > > > > *1. Query where dimensions are not involved* > > > > > > select p.type, count(*) from > > > dfs.tmp.`/analytics/processed/<some-tenant>/events` as p where > > occurred_at > > > > '2015-07-26' and p.type in ('plan.item.added','plan.item.removed') > > group > > > by p.type; > > > +--------------------+---------+ > > > | type | EXPR$1 | > > > +--------------------+---------+ > > > | plan.item.removed | 947 | > > > | plan.item.added | 40342 | > > > +--------------------+---------+ > > > 2 rows selected (0.508 seconds) > > > > > > > > > *2. Same query but involves dimension.type as well* > > > > > > select p.type, coalesce(p.dimensions.dim_type, p.dimensions.type) > > > dimensions_type, count(*) from > > > dfs.tmp.`/analytics/processed/<some-tenant>/events` as p where > > occurred_at > > > > '2015-07-26' and p.type in ('plan.item.added','plan.item.removed') > > group > > > by p.type, coalesce(p.dimensions.dim_type, p.dimensions.type); > > > > > > Error: SYSTEM ERROR: NumberFormatException: To See > > > Fragment 2:0 > > > [Error Id: 4756f549-cc47-43e5-899e-10a11efb60ea on localhost:31010] > > > (state=,code=0) > > > > > > > > > I can provide test data if this is not enough to reproduce this bug. > > > > > > Regards, > > > -Stefán > > > > > > > > > -- > Steven Phillips > Software Engineer > > mapr.com >