MultiFileWordCount uses its own RecordReader, namely
MultiFileLineRecordReader. This is different from the LineRecordReader
which automatically detects the file's codec, and decodes it.
You can write a custom RecordReader similar to LineRecordReader and
MultiFileLineRecordReader, or just add
Hi all,
I'm writing some Hadoop jobs that should run on a collection of
gzipped files. Everything is already working correctly with
MultiFileInputFormat and an initial step of gunzip extraction.
Considering that Hadoop recognizes and handles correctly .gz files (at
least with a single file