anyway to do "local" reduce like the combiner does?

2012-01-29 Thread Jianhui Zhang
I have a problem at hand that seems to need "local" reducing: I have a large data input, in which each line is a data mapping, something like "name : attribute". The attributes for the same name are usually pretty close in the file, so they are very likely to be processed by the same mapper.

MR output to a file instead of directory?

2012-03-02 Thread Jianhui Zhang
Hi all, The FileOutputFormat/FileOutputCommitter always treats an output path as a directory and write files under it, even if there is only one Reducer. Is there any way to configure an OutputFormat to write all data into a file? Thanks, James

How to get mapper counters (or equivalents) in reducer

2012-04-30 Thread Jianhui Zhang
Hi folks, Version: Hadoop 0.20.205. My reducer can be optimized if I can get a good estimate on how many records are produced by the mappers, that is, if I can get the MAP_OUTPUT_RECORDS counter (or its equivalent) in my reducer. However, I tried and always got 0, guess the counters are not passe