On 9/6/14, 9:36 AM, Alain Petrus wrote:

I am wondering whether is it possible to use Hive index and ORC format?  Does 
it make sense?

ORC maintains its own indexes within the file - one index record every 10,000 rows (orc.row.index.stride / orc.create.index).

You can take advantage of it during scan+filter with the following option

hive> set hive.optimize.index.filter=true;

A recent IBM paper did have some detailed analysis on ORC's indexing performance - but it is relatively "free" because there is no other step than just inserting into an ORC table.

The part where ORC does help a lot is if you then do a "ANALYZE TABLE" to build information required to make query plans better, because it will read the stats off the single index record at the bottom of each orc file (the "partial scan" mode).

Cheers,
Gopal

Reply via email to