subject:"Applications creates bigger output than input\?"

Applications creates bigger output than input?

2011-04-29 Thread elton sky

One of assumptions map reduce made, I think, is that size of map's output is smaller than input. Although we can see many applications have the same size of output with input, like, sort, merge,etc. For my benchmark purpose, I am looking for some non-trivial, real life applications which creates

Re: Applications creates bigger output than input?

2011-04-29 Thread John Meagher

Another case is augmenting data. This is sometimes done outside of MR in an ETL flow, but can be done as an MR job. Doing something like this is using Hadoop to handle the scaling issues, but really isn't what MR is intended for. A real example of this is: * Input: standard apache weblog *