Re: Pipelining Mappers and Reducers

2010-07-27 Thread Gregory Lawrence
Shai, It's hard to determine what the best solution would be without knowing more about your problem. In general, combiner functions work well but they will be of little value if each mapper output contains a unique key. This is because combiner functions only "combine" multiple values associat

RE: Why does the MR framework sorts the mapper output?

2010-07-27 Thread Chinni, Ravi
Thanks Alex and Ken. My application does not do aggregation. It mainly does some data cleansing and transformation. So I don't need a combiner. (Also, I don't see why a combiner always needs sorted input; it should be optional and user specified) To take advantage of some optimizations, I n

Re: Pipelining Mappers and Reducers

2010-07-27 Thread Shai Erera
Thanks for the prompt response Amogh ! I'm kinda rookie w/ Hadoop, so please forgive my perhaps "too rookie" questions :). Check the property mapred.reduce.slowstart.completed.maps > >From what I read here ( http://hadoop.apache.org/common/docs/current/mapred-default.html), this parameter contro

Re: Pipelining Mappers and Reducers

2010-07-27 Thread Amogh Vasekar
Hi, >>What would really be great for me is if I could have the Reducer start >>processing the map outputs as they are ready, and not after all Mappers finish Check the property mapred.reduce.slowstart.completed.maps >>I've read about chaining mappers, but to the best of my understanding the >>se

Pipelining Mappers and Reducers

2010-07-27 Thread Shai Erera
Hi I have a scenario for which I'd like to write a MR job in which Mappers do some work and eventually the output of all mappers need to be combined by a single Reducer. Each Mapper outputs that is distinct from all other Mappers, meaning the Reducer.reduce() method always receives a single eleme