Applications creates bigger output than input?

2011-04-29 Thread elton sky
One of assumptions map reduce made, I think, is that size of map's output is smaller than input. Although we can see many applications have the same size of output with input, like, sort, merge,etc. For my benchmark purpose, I am looking for some non-trivial, real life applications which creates

Re: Applications creates bigger output than input?

2011-04-29 Thread John Meagher
Another case is augmenting data. This is sometimes done outside of MR in an ETL flow, but can be done as an MR job. Doing something like this is using Hadoop to handle the scaling issues, but really isn't what MR is intended for. A real example of this is: * Input: standard apache weblog *