Shai,
It's hard to determine what the best solution would be without knowing more
about your problem. In general, combiner functions work well but they will be
of little value if each mapper output contains a unique key. This is because
combiner functions only "combine" multiple values associat
Thanks Alex and Ken.
My application does not do aggregation. It mainly does some data
cleansing and transformation. So I don't need a combiner. (Also, I don't
see why a combiner always needs sorted input; it should be optional and
user specified)
To take advantage of some optimizations, I n
Thanks for the prompt response Amogh !
I'm kinda rookie w/ Hadoop, so please forgive my perhaps "too rookie"
questions :).
Check the property mapred.reduce.slowstart.completed.maps
>
>From what I read here (
http://hadoop.apache.org/common/docs/current/mapred-default.html), this
parameter contro
Hi,
>>What would really be great for me is if I could have the Reducer start
>>processing the map outputs as they are ready, and not after all Mappers finish
Check the property mapred.reduce.slowstart.completed.maps
>>I've read about chaining mappers, but to the best of my understanding the
>>se
Hi
I have a scenario for which I'd like to write a MR job in which Mappers do
some work and eventually the output of all mappers need to be combined by a
single Reducer. Each Mapper outputs that is distinct from all
other Mappers, meaning the Reducer.reduce() method always receives a single
eleme