Thanks Tao. I know I can tell it is a lzo file based on the magic number. What I am curious is which class in hadoop used by the mapreduce job to determine the file compression algorithm. At the end of the day, I am trying to figure out whether all the inputs of a mapreduce job have to be compressed with the same algorithm.
On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xiaotao.cs....@gmail.com> wrote: > I suggest you download the lzo compressed file, no matter weather it has a > lzo extension as its file name, and open it in the form of hex bytes with > tools like UltraEdit, and have a look at its heading contents. > > > 2013/12/14 Jiayu Ji <jiayu...@gmail.com> > >> Hi >> >> I am having this question on how does mapreduce job determine the >> compress codec on hdfs. From what I read on the definitive guide (page >> 86)," the CompressionCodecFactory provides a way of mapping a filename >> extension to a CompressionCodec using its getCodec() method". I did a test >> with a lzo compressed file without a lzo extension. However, the mapreduce >> job was still able to get the right codec. Does anyone know why? Thanks in >> advance. >> >> Jiayu >> > > -- Jiayu (James) Ji, Cell: (312)823-7393