I have a problem at hand that seems to need "local" reducing:
I have a large data input, in which each line is a data mapping, something like
"name : attribute". The attributes for the same name are usually pretty close
in the file, so they are very likely to be processed by the same mapper.
Hi all,
The FileOutputFormat/FileOutputCommitter always treats an output path
as a directory and write files under it, even if there is only one
Reducer. Is there any way to configure an OutputFormat to write all
data into a file?
Thanks,
James
Hi folks,
Version: Hadoop 0.20.205.
My reducer can be optimized if I can get a good estimate on how many
records are produced by the mappers, that is, if I can get the
MAP_OUTPUT_RECORDS counter (or its equivalent) in my reducer. However,
I tried and always got 0, guess the counters are not passe