Creating the table

CREATE TABLE foo
    (tags MAP<String, String>, size int);
INSERT INTO foo VALUES
    (map("country", "USA"), 10),
    (map("country", "USA"), 20),
    (map("country", "UK"), 30);

SELECT * FROM foo

+--------------------+-----------+
|      foo.tags      | foo.size  |
+--------------------+-----------+
| {"country":"USA"}  | 10        |
| {"country":"USA"}  | 20        |
| {"country":"UK"}   | 30        |
+--------------------+-----------+

Aggregating the Table

SELECT DISTINCT tags["country"] from foo;

+-------+
|  _c0  |
+-------+
| USA   |
| NULL  |
+-------+

SELECT tags["country"], sum(size) FROM foo GROUP BY tags["country"];

+-------+------+
|  _c0  | _c1  |
+-------+------+
| USA   | 10   |
| NULL  | 50   |
+-------+------+

And even more strangely, with a subquery:

SELECT flattened.country, sum(flattened.size)
FROM (
    SELECT tags["country"] as country, size
    FROM foo
    WHERE tags["country"] IS NOT NULL
) as flattened
GROUP BY flattened.country;

+--------------------+------+
| flattened.country  | _c1  |
+--------------------+------+
| USA                | 10   |
+--------------------+------+

---------------

Is there any way to stop this from happening and get the correct aggregation 
behaviour? The only method I've found is to create a new table using the query, 
write that to disk and then do the aggregation on that.

Reply via email to