Thanks Jeff. Those are all valuable links. It seems there are quite a few
people out there working on incremental MapReduce.
2010/9/1 Jeff Hammerbacher
> Hey Stephen,
>
> There have been several proposals for implementing such a feature. See
> https://issues.apache.org/jira/browse/MAPREDUCE-121
> That isn't true. We are actively adding new features. However, there
> is certainly a focus on doing MapReduce well rather than trying to
> implement all potential distributed computation paradigms. I suspect
> that the right solution is doing two levels like Mesos:
>
> http://www.eecs.berkeley.e
Well, we have the jobs run in serial. I'm 100% positive that our job
consuming the loader output started after it completed, where completion is
according to Hadoop. The delay between the end of that job and the start of
the next one is not likely more than a few seconds though.
I'm not sure that
One possibility, due to the asynchronous nature of your loader, was that the
consumer job started before all files from loader were written (propagated)
completely.
Can you describe what problem you encountered with OutputCollector ?
On Thu, Sep 2, 2010 at 10:35 AM, Elton Pinto wrote:
> Hello,
Hello,
I apologize if this topic has already been brought up, but I was unable to
find it by searching around.
We recently discovered in issue in one of our jobs where the output of one
job does not seem to be making it into another job. The first job is a
loader job that's just a map step for as