Thanks Azurry. That was exactly the thing I want to know.
On Sun, Dec 15, 2013 at 7:53 PM, Azuryy Yu <azury...@gmail.com> wrote: > Hi Jiayu, > For the Sequence file as an input, CompressCodec class was serialized in > the file header, then Sequence Filereader will know the compression algo. > thanks. > > > > > On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji <jiayu...@gmail.com> wrote: > >> Thanks Tao. I know I can tell it is a lzo file based on the magic number. >> What I am curious is which class in hadoop used by the mapreduce job to >> determine the file compression algorithm. At the end of the day, I am >> trying to figure out whether all the inputs of a mapreduce job have to be >> compressed with the same algorithm. >> >> >> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xiaotao.cs....@gmail.com>wrote: >> >>> I suggest you download the lzo compressed file, no matter weather it has >>> a lzo extension as its file name, and open it in the form of hex bytes >>> with tools like UltraEdit, and have a look at its heading contents. >>> >>> >>> 2013/12/14 Jiayu Ji <jiayu...@gmail.com> >>> >>>> Hi >>>> >>>> I am having this question on how does mapreduce job determine the >>>> compress codec on hdfs. From what I read on the definitive guide (page >>>> 86)," the CompressionCodecFactory provides a way of mapping a filename >>>> extension to a CompressionCodec using its getCodec() method". I did a test >>>> with a lzo compressed file without a lzo extension. However, the mapreduce >>>> job was still able to get the right codec. Does anyone know why? Thanks in >>>> advance. >>>> >>>> Jiayu >>>> >>> >>> >> >> >> -- >> Jiayu (James) Ji, >> >> Cell: (312)823-7393 >> >> > -- Jiayu (James) Ji, Cell: (312)823-7393