Hi, I would be interesting to see the jobs' statistics (counters).
Thanks On Fri, Sep 7, 2012 at 3:25 AM, Young-Geun Park <[email protected]> wrote: > Hi, All > > I have tested which method is better between Lzo and SequenceFile for a BIG > file. > > File size is 10GiB and WordCount MR is used. > Inputs of WordCount MR are lzo which would be indexed by LzoIndexTool(lzo), > sequence file which is compressed by block level snappy(seq) , and > uncompressed original file(none). > > Map output is compressed except of uncompressed file. mapreduce output is > not compressed for all cases. > > The following are wordcount MR running time; > none lzo seq > 248s 243s 1410s > > -Test Environments > > OS : CentOS 5.6 (x64) (kernel = 2.6.18) > # of Core : 8 (cpu = Intel(R) Xeon(R) CPU E5504 @ 2.00GHz) > RAM : 18GB > Java version : 1.6.0_26 > Hadoop version : CDH3U2 > # of datanode(tasktracker) : 8 > > According to the result, The running time of SequnceFile is much less than > the others. > Before testing, I had expected that the results of both SequenceFile and > Lzo are about the same. > > I want to know why performance of the sequence file compressed by snappy is > so bad? > > do I miss anything in tests? > > > Regards, > Park > > -- Best Regards, Ruslan Al-Fakikh
