Textfile compression using Gzip codec

Sachin Sudarshana Wed, 05 Jun 2013 03:18:27 -0700

Hi,

I have hive 0.10 + (CDH 4.2.1 patches) installed on my cluster.


I have a table facts520_normal_text stored as a textfile. I'm trying to
create a compressed table from this table using GZip codec.

*hive> SET hive.exec.compress.output=true;*
*hive> SET
mapred.output.compression.codec=org.apache.hadoop.io.compress.GZipCodec;*
*hive> SET mapred.output.compression.type=BLOCK;*
*
*
*hive>*
*    > Create table facts520_gzip_text*
*    > (fact_key BIGINT,*
*    > products_key INT,*
*    > retailers_key INT,*
*    > suppliers_key INT,*
*    > time_key INT,*
*    > units INT)*
*    > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','*
*    > LINES TERMINATED BY '\n'*
*    > STORED AS TEXTFILE;*
*
*
*hive> INSERT OVERWRITE TABLE facts520_gzip_text SELECT * from
facts520_normal_text;*


When I run the above queries, the MR job fails.

The error that the Hive CLI itself shows is the following:

*Total MapReduce jobs = 3*
*Launching Job 1 out of 3*
*Number of reduce tasks is set to 0 since there's no reduce operator*
*Starting Job = job_201306051948_0010, Tracking URL =
http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010*
*Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_201306051948_0010*
*Hadoop job information for Stage-1: number of mappers: 3; number of
reducers: 0*
*2013-06-05 21:09:42,281 Stage-1 map = 0%,  reduce = 0%*
*2013-06-05 21:10:11,446 Stage-1 map = 100%,  reduce = 100%*
*Ended Job = job_201306051948_0010 with errors*
*Error during job, obtaining debugging information...*
*Job Tracking URL:
http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010*
*Examining task ID: task_201306051948_0010_m_000004 (and more) from job
job_201306051948_0010*
*Examining task ID: task_201306051948_0010_m_000001 (and more) from job
job_201306051948_0010*
*
*
*Task with the most failures(4):*
*-----*
*Task ID:*
*  task_201306051948_0010_m_000002*
*
*
*URL:*
*
http://aana1.ird.com:50030/taskdetails.jsp?jobid=job_201306051948_0010&tipid=task_201306051948_0010_m_000002
*
*-----*
*Diagnostic Messages for this Task:*
*java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
{"fact_key":7549094,"products_key":205,"retailers_key":304,"suppliers_key":402,"time_key":103,"units":23}
*
*        at
org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)*
*        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)*
*        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)*
*        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)*
*        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)*
*        at java.security.AccessController.doPrivileged(Native Method)*
*        at javax.security.auth.Subject.doAs(Subject.java:415)*
*        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
*
*        at org.apache.hadoop.mapred.Child.main(Child.java:262)*
*Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row
{"fact_key":7549094,"products_key":205,"retailers_key":304,"suppliers_key":402,"time_key":103,"units":23}
*
*        at org.apach*
*
*
*FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask*
*MapReduce Jobs Launched:*
*Job 0: Map: 3   HDFS Read: 0 HDFS Write: 0 FAIL*
*Total MapReduce CPU Time Spent: 0 msec*


I'm unable to figure out why this is happening. It looks like the data is
not being able to be copied properly.
Or is it that GZip codec is not supported on textfiles?

Any help in this issue is greatly appreciated!

Thank you,
Sachin

Textfile compression using Gzip codec

Reply via email to