Creating the table CREATE TABLE foo (tags MAP<String, String>, size int); INSERT INTO foo VALUES (map("country", "USA"), 10), (map("country", "USA"), 20), (map("country", "UK"), 30);
SELECT * FROM foo +--------------------+-----------+ | foo.tags | foo.size | +--------------------+-----------+ | {"country":"USA"} | 10 | | {"country":"USA"} | 20 | | {"country":"UK"} | 30 | +--------------------+-----------+ Aggregating the Table SELECT DISTINCT tags["country"] from foo; +-------+ | _c0 | +-------+ | USA | | NULL | +-------+ SELECT tags["country"], sum(size) FROM foo GROUP BY tags["country"]; +-------+------+ | _c0 | _c1 | +-------+------+ | USA | 10 | | NULL | 50 | +-------+------+ And even more strangely, with a subquery: SELECT flattened.country, sum(flattened.size) FROM ( SELECT tags["country"] as country, size FROM foo WHERE tags["country"] IS NOT NULL ) as flattened GROUP BY flattened.country; +--------------------+------+ | flattened.country | _c1 | +--------------------+------+ | USA | 10 | +--------------------+------+ --------------- Is there any way to stop this from happening and get the correct aggregation behaviour? The only method I've found is to create a new table using the query, write that to disk and then do the aggregation on that.