One of assumptions map reduce made, I think, is that size of map's output is
smaller than input. Although we can see many applications have the same size
of output with input, like, sort, merge,etc.
For my benchmark purpose, I am looking for some non-trivial, real life
applications which creates
Another case is augmenting data. This is sometimes done outside of MR
in an ETL flow, but can be done as an MR job. Doing something like
this is using Hadoop to handle the scaling issues, but really isn't
what MR is intended for.
A real example of this is:
* Input: standard apache weblog
*