Re: Select distinct on partitioned column requires reading all the files?

Gopal Vijayaraghavan Mon, 23 Feb 2015 22:37:00 -0800

Hi,

Are you sure you have


hive.optimize.metadataonly=true ?

I¹m not saying it will complete instantaneously (possibly even be very slow,
due to the lack of a temp-table optimization of that), but it won¹t read any
part of the actual table.

Cheers,
Gopal

From:  Stephen Boesch <[email protected]>
Reply-To:  "[email protected]" <[email protected]>
Date:  Monday, February 23, 2015 at 10:26 PM
To:  "[email protected]" <[email protected]>
Subject:  Select distinct on partitioned column requires reading all the
files?


When querying a hive table according to a partitioning column, it would be
logical that a simple
select count(distinct partitioned_column_name) from my_partitioned_table
would complete almost instantaneously.
But we are seeing that both hive and impala are unable to execute this query
properly: they just read the entire table!
What do we need to do to ensure the above command executes rapidly?

Re: Select distinct on partitioned column requires reading all the files?

Reply via email to