Re: help with compression and index

Mark Grover Tue, 21 Feb 2012 14:04:29 -0800

Hi Robert,
As per https://issues.apache.org/jira/browse/HIVE-1644, Hive 0.8 introduces 
automatic accessing of indexes. That might come in handy too!


Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: mgro...@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


----- Original Message -----
From: "Bejoy Ks" <bejoy...@yahoo.com>
To: user@hive.apache.org
Sent: Tuesday, February 21, 2012 11:47:56 AM
Subject: Re: help with compression and index



Hi Hamilton 
When you are doing indexing(generate index files) is compression enabled? If so 
you are running into this known issue 
https://issues.apache.org/jira/browse/HIVE-2331 


Which is fixed in hive 0.8 . An upgrade should get it rolling for you and is 
recommended. 


Regards 
Bejoy.K.S 








From: "Hamilton, Robert (Austin)" <robert.hamil...@hp.com> 
To: "user@hive.apache.org" <user@hive.apache.org> 
Sent: Tuesday, February 21, 2012 8:48 PM 
Subject: help with compression and index 

Hi all. I sent this to common-user@hadoop hoping there was an easy answer but 
got no response. 

I have a couple of users who basically have no use case other than the need to 
extract specific rows based on some predetermined set of keys, so I would like 
to be able to just provide them with an index and show them how to join to the 
detail table using the index. So I'm looking for a reliable compression+index 
method with hive. To get an idea of the data size my files add up to about 80TB 
uncompressed but currently gzipped to only 10 TB - I need to keep it small 
(ish) until I can get more disk space, so it has to stay compressed. 

I don't mind recompressing to LZO or bzip but need to prove that it would 
actually work first :) 

I've done my testing on LZO and uncompressed test samples. If I use 
uncompressed files the indexed select works OK. If I use LZO it returns only a 
fraction of the rows I expect. I gather that files compressed with other 
compression methods cannot be indexed at all with Hive 0.7.1? 

I'm following the prescription to select buckets/offets into a temporary file, 
set hive.index.compact.file to the temp file, set hive.input.format to 
HiveCompactIndexInputFormat and run my select. That doesn't let me do 
subselects but I don't mind as it is only a very limited use case that I need 
to support. 

This is the only method I could find documented on the net. Is there a better 
way to do this? I don't mind upgrading Hive (currently on 0.7.1) or Hadoop 
(currently 0.20.2)?

Re: help with compression and index

Reply via email to