When querying a hive table according to a partitioning column, it would be
logical that a simple
select count(distinct partitioned_column_name) from my_partitioned_table
would complete almost instantaneously.
But we are seeing that both hive and impala are unable to execute this
query properly:
Subject: Select distinct on partitioned column requires reading all the
files?
When querying a hive table according to a partitioning column, it would be
logical that a simple
select count(distinct partitioned_column_name) from my_partitioned_table
would complete almost instantaneously.
But we
Reply-To: user@hive.apache.org user@hive.apache.org
Date: Monday, February 23, 2015 at 10:26 PM
To: user@hive.apache.org user@hive.apache.org
Subject: Select distinct on partitioned column requires reading all the
files?
When querying a hive table according to a partitioning column, it would