Hi,
It seems that null values can trigger a column to be treated as a numeric
one, in expressions evaluation, regardless of content or other indicators
and that fields in substructures can affect same-named-fields in parent
structure.
(1.2-SNAPSHOT, parquet files)
I have JSON data that can be reduced to to this:
- {"occurred_at":"2015-07-26
08:45:41.234","type":"plan.item.added","dimensions":{"type":null,"dim_type":"Unspecified","category":"Unspecified","sub_category":null}}
- {"occurred_at":"2015-07-26
08:45:43.598","type":"plan.item.removed","dimensions":{"type":"Unspecified","dim_type":null,"category":"Unspecified","sub_category":null}}
- {"occurred_at":"2015-07-26
08:45:44.241","type":"plan.item.removed","dimensions":{"type":"To
See","category":"Nature","sub_category":"Waterfalls"}}
* notice the discrepancy in the dimensions structure that the type field is
either called type or dim_type (slightly relevant for the rest of this case)
*1. Query where dimensions are not involved*
select p.type, count(*) from
dfs.tmp.`/analytics/processed/<some-tenant>/events` as p where occurred_at
> '2015-07-26' and p.type in ('plan.item.added','plan.item.removed') group
by p.type;
+--------------------+---------+
| type | EXPR$1 |
+--------------------+---------+
| plan.item.removed | 947 |
| plan.item.added | 40342 |
+--------------------+---------+
2 rows selected (0.508 seconds)
*2. Same query but involves dimension.type as well*
select p.type, coalesce(p.dimensions.dim_type, p.dimensions.type)
dimensions_type, count(*) from
dfs.tmp.`/analytics/processed/<some-tenant>/events` as p where occurred_at
> '2015-07-26' and p.type in ('plan.item.added','plan.item.removed') group
by p.type, coalesce(p.dimensions.dim_type, p.dimensions.type);
Error: SYSTEM ERROR: NumberFormatException: To See
Fragment 2:0
[Error Id: 4756f549-cc47-43e5-899e-10a11efb60ea on localhost:31010]
(state=,code=0)
I can provide test data if this is not enough to reproduce this bug.
Regards,
-Stefán