Re: HADOOP-7106: Re-organize hadoop subversion layout

2011-04-29 Thread Todd Lipcon
On Thu, Apr 28, 2011 at 10:06 PM, Nigel Daley nda...@mac.com wrote: As announced last week, I'm planning to do this at 2pm PDT tomorrow (Friday) April 29. Suresh, when do you plan to commit HFS-1052? That should be done first. Owen or Todd, did you want to follow Paul's advice: If you're

Applications creates bigger output than input?

2011-04-29 Thread elton sky
One of assumptions map reduce made, I think, is that size of map's output is smaller than input. Although we can see many applications have the same size of output with input, like, sort, merge,etc. For my benchmark purpose, I am looking for some non-trivial, real life applications which creates

Re: Applications creates bigger output than input?

2011-04-29 Thread John Meagher
Another case is augmenting data. This is sometimes done outside of MR in an ETL flow, but can be done as an MR job. Doing something like this is using Hadoop to handle the scaling issues, but really isn't what MR is intended for. A real example of this is: * Input: standard apache weblog *

Re: HADOOP-7106: Re-organize hadoop subversion layout

2011-04-29 Thread Owen O'Malley
On Apr 28, 2011, at 11:24 PM, Todd Lipcon wrote: Wasn't sure how to go about doing that. I guess we need to talk to infra about it? Do you know how we might clone the SVN repos themselves to test with? It looks like there are svn dumps at http://svn-master.apache.org/dump/ from 2 april