How can I efficiently store data in Hive and also store and retrieve
compressed data in hive?

Currently I am storing it as a TextFile.

I was going through Bejoy article (
http://kickstarthadoop.blogspot.com/2011/10/how-to-efficiently-store-data-in-hive.html)
and I found that LZO compression will be good for storing the files and
also it is splittable.



I have one HiveQL Select query that is generating some output and I am
storing that output somewhere so that one of my Hive table (quality) can
use that data so that I can query that quality.



Below is the quality table in which I am loading the data from the below
SELECT query by making the partition I am using to overwrite table quality.



*create table quality*

*(id bigint,*

*  total bigint,*

*  error bigint*

* )*

* partitioned by (ds string)*

*row format delimited fields terminated by '\t'*

*stored as textfile*

*location '/user/uname/quality'*

*;*

* *

*insert overwrite table quality partition (ds='20120709')*

*SELECT id  , count2 , coalesce(error, cast(0 AS BIGINT)) AS count1  FROM
Table1;*





So here currently I am storing it as a TextFile, should I make this as a
Sequence file and start storing the data in LZO compression format? Or text
file will be fine here also? As from the select query I will be getting
some GB of data, that need to be uploaded on table quality on a daily basis.



So which way is best? Should I store the output as a TextFile or
SequenceFile format (LZO compression) so that when I am query the Hive
quality table, querying is faster.

Reply via email to