Hi Robert, As per https://issues.apache.org/jira/browse/HIVE-1644, Hive 0.8 introduces automatic accessing of indexes. That might come in handy too!
Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com e: mgro...@oanda.com "Best Trading Platform" - World Finance's Forex Awards 2009. "The One to Watch" - Treasury Today's Adam Smith Awards 2009. ----- Original Message ----- From: "Bejoy Ks" <bejoy...@yahoo.com> To: user@hive.apache.org Sent: Tuesday, February 21, 2012 11:47:56 AM Subject: Re: help with compression and index Hi Hamilton When you are doing indexing(generate index files) is compression enabled? If so you are running into this known issue https://issues.apache.org/jira/browse/HIVE-2331 Which is fixed in hive 0.8 . An upgrade should get it rolling for you and is recommended. Regards Bejoy.K.S From: "Hamilton, Robert (Austin)" <robert.hamil...@hp.com> To: "user@hive.apache.org" <user@hive.apache.org> Sent: Tuesday, February 21, 2012 8:48 PM Subject: help with compression and index Hi all. I sent this to common-user@hadoop hoping there was an easy answer but got no response. I have a couple of users who basically have no use case other than the need to extract specific rows based on some predetermined set of keys, so I would like to be able to just provide them with an index and show them how to join to the detail table using the index. So I'm looking for a reliable compression+index method with hive. To get an idea of the data size my files add up to about 80TB uncompressed but currently gzipped to only 10 TB - I need to keep it small (ish) until I can get more disk space, so it has to stay compressed. I don't mind recompressing to LZO or bzip but need to prove that it would actually work first :) I've done my testing on LZO and uncompressed test samples. If I use uncompressed files the indexed select works OK. If I use LZO it returns only a fraction of the rows I expect. I gather that files compressed with other compression methods cannot be indexed at all with Hive 0.7.1? I'm following the prescription to select buckets/offets into a temporary file, set hive.index.compact.file to the temp file, set hive.input.format to HiveCompactIndexInputFormat and run my select. That doesn't let me do subselects but I don't mind as it is only a very limited use case that I need to support. This is the only method I could find documented on the net. Is there a better way to do this? I don't mind upgrading Hive (currently on 0.7.1) or Hadoop (currently 0.20.2)?