Hi Hamilton
    When you are doing indexing(generate index files) is compression enabled? 
If so you are running into this known issue
https://issues.apache.org/jira/browse/HIVE-2331

Which is fixed in hive 0.8 . An upgrade should get it rolling for you and is 
recommended.

Regards
Bejoy.K.S




________________________________
 From: "Hamilton, Robert (Austin)" <robert.hamil...@hp.com>
To: "user@hive.apache.org" <user@hive.apache.org> 
Sent: Tuesday, February 21, 2012 8:48 PM
Subject: help with compression and index
 
Hi all. I sent this to common-user@hadoop hoping there was an easy answer but 
got no response.

I have a couple of users who basically have no use case other than the need to 
extract specific rows based on some predetermined set of keys, so I would like 
to be able to just provide them with an index and show them how to join to the 
detail table using the index.  So I'm looking for a reliable compression+index 
method with hive.  To get an idea of the data size my files add up to about 
80TB uncompressed but currently gzipped to only 10 TB - I need to keep it small 
(ish) until I can get more disk space, so it has to stay compressed. 

I don't mind recompressing to LZO or bzip but need to prove that it would 
actually work first :)

I've done my testing on LZO and uncompressed test samples. If I use 
uncompressed files the indexed select works OK. If I use LZO it returns only a 
fraction of the rows I expect.  I gather that files compressed with other 
compression methods cannot be indexed at all with Hive 0.7.1?

I'm following the prescription to select buckets/offets into a temporary file, 
set hive.index.compact.file to the temp file, set hive.input.format to 
HiveCompactIndexInputFormat and run my select.  That doesn't let me do 
subselects but I don't mind as it is only a very limited use case that I need 
to support.

This is the only method I could find documented on the net.  Is there a better 
way to do this? I don't mind upgrading Hive (currently on 0.7.1) or Hadoop 
(currently 0.20.2)?

Reply via email to