How can I efficiently store data in Hive and also store and retrieve compressed data in hive?
Currently I am storing it as a TextFile. I was going through Bejoy article ( http://kickstarthadoop.blogspot.com/2011/10/how-to-efficiently-store-data-in-hive.html) and I found that LZO compression will be good for storing the files and also it is splittable. I have one HiveQL Select query that is generating some output and I am storing that output somewhere so that one of my Hive table (quality) can use that data so that I can query that quality. Below is the quality table in which I am loading the data from the below SELECT query by making the partition I am using to overwrite table quality. *create table quality* *(id bigint,* * total bigint,* * error bigint* * )* * partitioned by (ds string)* *row format delimited fields terminated by '\t'* *stored as textfile* *location '/user/uname/quality'* *;* * * *insert overwrite table quality partition (ds='20120709')* *SELECT id , count2 , coalesce(error, cast(0 AS BIGINT)) AS count1 FROM Table1;* So here currently I am storing it as a TextFile, should I make this as a Sequence file and start storing the data in LZO compression format? Or text file will be fine here also? As from the select query I will be getting some GB of data, that need to be uploaded on table quality on a daily basis. So which way is best? Should I store the output as a TextFile or SequenceFile format (LZO compression) so that when I am query the Hive quality table, querying is faster.