Re: Hive Index and ORC

Gopal V Tue, 09 Sep 2014 12:22:16 -0700

On 9/6/14, 9:36 AM, Alain Petrus wrote:

I am wondering whether is it possible to use Hive index and ORC format?  Does 
it make sense?

ORC maintains its own indexes within the file - one index record every10,000 rows (orc.row.index.stride / orc.create.index).


You can take advantage of it during scan+filter with the following option

hive> set hive.optimize.index.filter=true;

A recent IBM paper did have some detailed analysis on ORC's indexingperformance - but it is relatively "free" because there is no other stepthan just inserting into an ORC table.

The part where ORC does help a lot is if you then do a "ANALYZE TABLE"to build information required to make query plans better, because itwill read the stats off the single index record at the bottom of eachorc file (the "partial scan" mode).


Cheers,
Gopal

Re: Hive Index and ORC

Reply via email to