Hi, Modularity!
I've always had the same question before. However, Tom White put that thought to rest: "It’s possible to make map and reduce functions even more composable than we have done. A mapper commonly performs input format parsing, projection (selecting the relevant fields), and filtering (removing records that are not of interest). In the mappers you have seen so far, we have implemented all of these functions in a single mapper. However, there is a case for splitting these into distinct mappers and chaining them into a single mapper using the ChainMapper library class that comes with Hadoop. Combined with a ChainReducer, you can run a chain of mappers, followed by a reducer and another chain of mappers in a single MapReduce job." - Tom White, Hadoop: Definitive Guide (2nd Ed.) Personally though, I've not really used it much. They aren't anything more than convenience methods. Not "real" chaining at the framework level. On Fri, Sep 28, 2012 at 7:02 PM, Sigurd Spieckermann <sigurd.spieckerm...@gmail.com> wrote: > Hi guys, > > I have stumbled upon ChainMapper and ChainReducer and I am wondering why > they exist. I imagine that everything you can implement with ChainMapper and > ChainReducer can be implemented with just a Mapper and a Reducer containing > all the code of the respective chain-implementations. Or am I missing > certain aspects about why they are more than just convenience concepts? > > Thanks for clarifying this! > Sigurd -- Harsh J