I am having trouble reading gzip compressed input. Is this a known problem? Any workarounds? (I am using gzip 1.3.3 )
Thanks, Delip $ hadoop dfs -ls input Found 1 items -rw-r--r-- 3 huser supergroup 17532230 2008-12-11 23:52 /user/huser/input/words.gz $ hadoop jar hadoop-0.19.0-examples.jar wordcount input output 08/12/12 00:23:10 INFO mapred.FileInputFormat: Total input paths to process : 1 08/12/12 00:23:10 INFO mapred.JobClient: Running job: job_200812100142_0072 08/12/12 00:23:11 INFO mapred.JobClient: map 0% reduce 0% 08/12/12 00:23:32 INFO mapred.JobClient: Task Id : attempt_200812100142_0072_m_000000_0, Status : FAILED java.lang.InternalError at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init(Native Method) at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.<init>(ZlibDecompressor.java:114) at org.apache.hadoop.io.compress.GzipCodec.createDecompressor(GzipCodec.java:188) at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:170) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:82) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:321) at org.apache.hadoop.mapred.Child.main(Child.java:155) 08/12/12 00:23:44 INFO mapred.JobClient: Task Id : attempt_200812100142_0072_m_000000_1, Status : FAILED ...