>So in a nutshell in Hive if "external" indexes are not used for improving >query response, what value they add and can we forget them for now?
The builtin indexes - those that write data as smaller tables are only useful in a pre-columnar world, where the indexes offer a huge reduction in IO. Part #1 of using hive indexes effectively is to write your own HiveIndexHandler, with usesIndexTable=false; And then write a IndexPredicateAnalyzer, which lets you map arbitrary lookups into other range conditions. Not coincidentally - we're adding a "ANALYZE TABLE ... CACHE METADATA" which consolidates the "internal" index into an external store (HBase). Some of the index data now lives in the HBase metastore, so that the inclusion/exclusion of whole partitions can be done off the consolidated index. https://issues.apache.org/jira/browse/HIVE-11676 The experience from BI workloads run by customers is that in general, the lookup to the right "slice" of data is more of a problem than the actual aggregate. And that for a workhorse data warehouse, this has to survive even if there's a non-stop stream of updates into it. Cheers, Gopal