> cast(NULL as bigint) as malone_id,
> cast(NULL as bigint) as zpid,
I ran this on master (with text vectorization off) and I get
20170626 123 NULL NULL 10
However, I think the backtracking for the columns is broken, somewhere - where
both the nulls end up being represented by 1 column & that I think breaks text
vectorization somewhere.
> Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["sum(VALUE._col0)"],keys:20170626,
> 123, KEY._col2, KEY._col2
See the repetition of _col2, while output has a _col3 (and _col4 is the
aggregate result).
Hive-1.2 has similar issues (which I assume 2.1.0 has too).
Group By Operator
aggregations: sum(COALESCE(10,0))
keys: 20170626 (type: int), 123 (type: int), null (type:
bigint), null (type: bigint)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 32 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: 20170626 (type: int), 123 (type: int),
_col3 (type: bigint)
sort order: +++
Map-reduce partition columns: 20170626 (type: int), 123
(type: int), _col3 (type: bigint)
Statistics: Num rows: 1 Data size: 32 Basic stats:
COMPLETE Column stats: COMPLETE
value expressions: _col3 (type: bigint)
_col4 should've been the value expression, not _col3 and _col2 should've been
in the key expression + partition columns (because you're grouping by 3
columns).
> what do you think? is it me? or is it hive?
Definitely Hive.
If you file a JIRA, please run against a 1-row ORC table and report the
vectorization issue too.
A performant fix to the problem would be to fix this similarly to how I'm
trying to fix views with PTF + filters (i.e the filter injects a constant into
a window function).
https://issues.apache.org/jira/browse/HIVE-16541
Doing the same with the GroupBy would prevent constants from showing up in a
group-by like this.
These can happen because of good engineering too, you don't end up writing a
group-by with a "cast(null as bigint)" - you write a view with a groupby and
then call it with a "where zpid is null and malone_id is null".
Cheers,
Gopal