Re: Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-28 Thread Hardik Pandya
what is your compression format gzip, lzo or snappy for lzo final output FileOutputFormat.setCompressOutput(conf, true); FileOutputFormat.setOutputCompressorClass(conf, LzoCodec.class); In addition, to make LZO splittable, you need to make a LZO index file. On Thu, Mar 27, 2014 at 8:57 PM,

Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-27 Thread Kim Chew
I have a simple M/R job using Mapper only thus no reducer. The mapper read a timestamp from the value, generate a path to the output file and writes the key and value to the output file. The input file is a sequence file, not compressed and stored in the HDFS, it has a size of 162.68 MB. Output

Re: Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-27 Thread Thomas Bentsen
Have you checked the content of the files you write? /th On Thu, 2014-03-27 at 11:43 -0700, Kim Chew wrote: I have a simple M/R job using Mapper only thus no reducer. The mapper read a timestamp from the value, generate a path to the output file and writes the key and value to the output

Re: Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-27 Thread Kim Chew
Yea, gonna do that. 8-) Kim On Thu, Mar 27, 2014 at 12:30 PM, Thomas Bentsen t...@bentzn.com wrote: Have you checked the content of the files you write? /th On Thu, 2014-03-27 at 11:43 -0700, Kim Chew wrote: I have a simple M/R job using Mapper only thus no reducer. The mapper read a

Re: Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-27 Thread Kim Chew
I am also wondering if, say, I have two identical timestamp so they are going to be written to the same file. Does MulitpleOutputs handle appending? Thanks. Kim On Thu, Mar 27, 2014 at 12:30 PM, Thomas Bentsen t...@bentzn.com wrote: Have you checked the content of the files you write? /th

Re: Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-27 Thread Kim Chew
Thanks folks. I am not awared my input data file has been compressed. FileOutputFromat.setCompressOutput() is set to true when the file is written. 8-( Kim On Thu, Mar 27, 2014 at 5:46 PM, Mostafa Ead mostafa.g@gmail.comwrote: The following might answer you partially: Input key is not