Hi Experts
            I'm currently struck when using compression with hive. My 
requirement is I need to store data in hive tables in LZO compressed format and 
should be able to retrieve the same successfully. However with the hive wiki 
and mailing list archives I was able to succeed only on the first part 'Storing 
data in LZO compressed format in hive tables' but failing with the second part 
'Retrieving data back from the table'.

I have a non partitioned table that holds uncompressed text data and want to 
load this data into a partitioned table in LZO format. I used Hive Dynamic 
Partition concept to populate data into partitions for the same. On execution 
of the hive QL  data is loaded in compressed format on target table, the hive 
CLI showed the no of rows loaded which was correct. Also I verified the 
directory that corresponded to the table in hdfs, there also I could find sub 
folders and files within with .deflate extension in sub folders. But when I 
fire any query on this table I can see the map reduce job executing and showing 
an OK finally but No results are returned under any conditions. It is the same 
with a simple 'Select *' query as well, Just  an OK message with no rows 
returned. On the hive CLI I enabled the following parameters.

//for output compress
set hive.exec.compress.output=true;

//for dynamic partitions

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

I have posted below the queries i have executed from my end. Could any one 
please guide me on how you have actually implemented LZO on your hive table  or 
help me out in crossing this hurdle? 


Thanks in Advance

Source table

CREATE EXTERNAL TABLE bejoy_test_employee(eemployeeId Int, name String,codeId 
Int,country String )
COMMENT 'test table for LZO compression' 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t' 
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/u/bejoy/external_tables/bejoy_test_employee' ;

Target Table

CREATE EXTERNAL TABLE bejoy_test_employee_partn(employeeId Int, name 
String,codeId Int)
COMMENT 'test table for LZO compression' 
PARTITIONED BY(country String) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t' 
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
LOCATION '/u/bejoy/external_tables/bejoy_test_employee_partn' ;

Query used for Dynamic Partitions
INSERT OVERWRITE TABLE bejoy_test_employee_partn PARTITION(country)
SELECT eemployeeId,name,codeId,country FROM bejoy_test_employee;


//Hive CLI log once the Dynamic Partitions query is executed
Partition default.bejoy_test_employee_partn{country=Australia} stats: 
[num_files: 1, num_rows: 0, total_size: 62114]
Partition default.bejoy_test_employee_partn{country=Canada} stats: [num_files: 
1, num_rows: 0, total_size: 62327]
Partition default.bejoy_test_employee_partn{country=France} stats: [num_files: 
1, num_rows: 0, total_size: 62377]
Partition default.bejoy_test_employee_partn{country=Germany} stats: [num_files: 
1, num_rows: 0, total_size: 61667]
Partition default.bejoy_test_employee_partn{country=India} stats: [num_files: 
1, num_rows: 0, total_size: 62341]
Partition default.bejoy_test_employee_partn{country=Italy} stats: [num_files: 
1, num_rows: 0, total_size: 62463]
Partition default.bejoy_test_employee_partn{country=Japan} stats: [num_files: 
1, num_rows: 0, total_size: 62201]
Partition default.bejoy_test_employee_partn{country=Middle-East} stats: 
[num_files: 1, num_rows: 0, total_size: 62360]
Partition default.bejoy_test_employee_partn{country=UK} stats: [num_files: 1, 
num_rows: 0, total_size: 62334]
Partition default.bejoy_test_employee_partn{country=USA} stats: [num_files: 1, 
num_rows: 0, total_size: 62191]
Table default.bejoy_test_employee_partn stats: [num_partitions: 10, num_files: 
10, num_rows: 0, total_size: 622375]
101040 Rows loaded to bejoy_test_employee_partn
OK
Time taken: 15.576 seconds


//hive CLI log once queries are fired on the table
hive> select * from  bejoy_test_employee_partn;
OK
Time taken: 0.074 seconds

Regards
Bejoy.K.S

Reply via email to