i'm running hive version 2.1.0 and found this interesting. i've broken it
down into a trivial test case below.
i run this:
select a.date_key,
a.property_id,
cast(NULL as bigint) as malone_id,
cast(NULL as bigint) as zpid,
sum(coalesce(a.sum_365,0)) as sum_365
from
(select 20170626 as date_key, 123 as property_id, 10 as
sum_365 ) a
group by
1
,2
,3
,4
and i get:
+----------+-------------+-----------+--------+---------+
| date_key | property_id | malone_id | zpid | sum_365 |
+----------+-------------+-----------+--------+---------+
| 20170626 | 123 | [NULL] | [NULL] | *[NULL]* |
+----------+-------------+-----------+--------+---------+
why do you suppose sum_365 is null?
then i run this:
select a.date_key,
a.property_id,
cast(NULL as bigint) as malone_id,
sum(coalesce(a.sum_365,0)) as sum_365
from
(select 20170626 as date_key, 123 as property_id, 10 as
sum_365 ) a
group by
1
,2
,3
and i get:
+----------+-------------+-----------+---------+
| date_key | property_id | malone_id | sum_365 |
+----------+-------------+-----------+---------+
| 20170626 | 123 | [NULL] | *10 * |
+----------+-------------+-----------+---------+
what do you think? is it me? or is it hive? (looks to me grouping by two
NULL's in a row causes a problem.)