Hi Experts I'm currently struck when using compression with hive. My requirement is I need to store data in hive tables in LZO compressed format and should be able to retrieve the same successfully. However with the hive wiki and mailing list archives I was able to succeed only on the first part 'Storing data in LZO compressed format in hive tables' but failing with the second part 'Retrieving data back from the table'.
I have a non partitioned table that holds uncompressed text data and want to load this data into a partitioned table in LZO format. I used Hive Dynamic Partition concept to populate data into partitions for the same. On execution of the hive QL data is loaded in compressed format on target table, the hive CLI showed the no of rows loaded which was correct. Also I verified the directory that corresponded to the table in hdfs, there also I could find sub folders and files within with .deflate extension in sub folders. But when I fire any query on this table I can see the map reduce job executing and showing an OK finally but No results are returned under any conditions. It is the same with a simple 'Select *' query as well, Just an OK message with no rows returned. On the hive CLI I enabled the following parameters. //for output compress set hive.exec.compress.output=true; //for dynamic partitions set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; I have posted below the queries i have executed from my end. Could any one please guide me on how you have actually implemented LZO on your hive table or help me out in crossing this hurdle? Thanks in Advance Source table CREATE EXTERNAL TABLE bejoy_test_employee(eemployeeId Int, name String,codeId Int,country String ) COMMENT 'test table for LZO compression' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/u/bejoy/external_tables/bejoy_test_employee' ; Target Table CREATE EXTERNAL TABLE bejoy_test_employee_partn(employeeId Int, name String,codeId Int) COMMENT 'test table for LZO compression' PARTITIONED BY(country String) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat" LOCATION '/u/bejoy/external_tables/bejoy_test_employee_partn' ; Query used for Dynamic Partitions INSERT OVERWRITE TABLE bejoy_test_employee_partn PARTITION(country) SELECT eemployeeId,name,codeId,country FROM bejoy_test_employee; //Hive CLI log once the Dynamic Partitions query is executed Partition default.bejoy_test_employee_partn{country=Australia} stats: [num_files: 1, num_rows: 0, total_size: 62114] Partition default.bejoy_test_employee_partn{country=Canada} stats: [num_files: 1, num_rows: 0, total_size: 62327] Partition default.bejoy_test_employee_partn{country=France} stats: [num_files: 1, num_rows: 0, total_size: 62377] Partition default.bejoy_test_employee_partn{country=Germany} stats: [num_files: 1, num_rows: 0, total_size: 61667] Partition default.bejoy_test_employee_partn{country=India} stats: [num_files: 1, num_rows: 0, total_size: 62341] Partition default.bejoy_test_employee_partn{country=Italy} stats: [num_files: 1, num_rows: 0, total_size: 62463] Partition default.bejoy_test_employee_partn{country=Japan} stats: [num_files: 1, num_rows: 0, total_size: 62201] Partition default.bejoy_test_employee_partn{country=Middle-East} stats: [num_files: 1, num_rows: 0, total_size: 62360] Partition default.bejoy_test_employee_partn{country=UK} stats: [num_files: 1, num_rows: 0, total_size: 62334] Partition default.bejoy_test_employee_partn{country=USA} stats: [num_files: 1, num_rows: 0, total_size: 62191] Table default.bejoy_test_employee_partn stats: [num_partitions: 10, num_files: 10, num_rows: 0, total_size: 622375] 101040 Rows loaded to bejoy_test_employee_partn OK Time taken: 15.576 seconds //hive CLI log once queries are fired on the table hive> select * from bejoy_test_employee_partn; OK Time taken: 0.074 seconds Regards Bejoy.K.S