Hello, I am started to run Hive with Lzo compression on Hortonworks 1.2 I have managed to install/configure Lzo and hive -e "set io.compression.codecs" shows me the Lzo Codecs: io.compression.codecs= org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec, org.apache.hadoop.io.compress.BZip2Codec However, I have some questions where I would be happy if you could help me. (1) CREATE TABLE statement
I read in different postings, that in the CREATE TABLE statement, I have to use the following STORAGE clause: CREATE EXTERNAL TABLE txt_table_lzo ( txt_line STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/user/myuser/data/in/lzo_compressed'; It works withouth any problems now to execute SELECT statements on this table with Lzo data. However I also created a table on the same data without this STORAGE clause: CREATE EXTERNAL TABLE txt_table_lzo_tst ( txt_line STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||' LOCATION '/user/myuser/data/in/lzo_compressed'; The interesting thing is, it works as well, when I execute a SELECT statement and this table. Can you help, why the second CREATE TABLE statement works as well? What should I use in DDLs? Is it best practice to use the STORED AS clause with a "deprecatedLzoTextInputFormat"? Or should I remove it? (2) Output and Intermediate Compression Settings I want to use output compression . In "Programming Hive" from Capriolo, Wampler, Rutherglen the following commands are recommended: SET hive.exec.compress.output=true; SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; However, in some other places in forums, I found the following recommended settings: SET hive.exec.compress.output=true SET mapreduce.output.fileoutputformat.compress=true SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec Am I right, that the first settings are for Hadoop versions prior 0.23? Or is there any other reason why the settings are different? I am using Hadoop 1.1.2 with Hive 0.10.0. Which settings would you recommend to use? -------------- I also want to compress intermediate results. Again, in "Programming Hive" the following settings are recommended: SET hive.exec.compress.intermediate=true; SET mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; Is this the right setting? Or should I again use the settings (which look more valid for Hadoop 0.23 and greater)?: SET hive.exec.compress.intermediate=true; SET mapreduce.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec; Thanks