Hello all.
In general, almost all mapper in hadoop application have the simple structure. Just read 1 record from input split, and then write intermediate output to the local disk. However, some Mappers such as CanopyMapper, Step0Mapper, NaiveBayesThetaMapper read all records in the Mapper, and compute something, and finally write intermediate output to local disk at "cleanup" stage. I'm finding cases like these now in order to improve hadoop scheduling. I believe there are many cases which are complex, but it is very hard for me to find real examples which are practically used nowadays. Could you let me know if you guys know anything about this kinds of applications? Even if your application or some application you know is not complete, that is not problem, because I just would like to convince that there are such cases in real world. Thanks and regards Jongse Park
