>>However, from ri to m(i+1) there is an unnecessary barrier. m(i+1) should not
>>need to wait for all reducers ri to finish, right?
Yes, but r(i+1) cant be in the same job, since that requires another sort and
shuffle phase ( barrier ). So you would end up doing, job(i) : m(i)r(i)m(i+1) .
Job
Hi,
>>Will there be a re-assignment of Map & Reduce nodes by the Master?
In general using available schedulers, I believe so. Because if it weren't, and
I submit job 2 needing different/additional set of inputs, the data locality
considerations would be somewhat hampered right? When we had HOD, t
Hi Ed,
Currently my program is like this: m1,r1, m2,r2, ..., mK, rK. The
barrier between mi and ri is acceptable since reducer has to wait for
all map task to finish. However, from ri to m(i+1) there is an
unnecessary barrier. m(i+1) should not need to wait for all reducers
ri to finish, right?
Felix,
You can use ChainMapper and ChainReducer to create jobs of the form
M+RM*. Is that what you're looking for? I'm not aware of anything that
allows you to have multiple reduce functions without the job
"barrier".
Ed
On Wed, Feb 3, 2010 at 9:41 PM, Felix Halim wrote:
> Hi all,
>
> As far as
The Hadoop PMC has voted to make Vinod Kumar V. a committer on Hadoop
Map-Reduce.
Congratulations, Vinod.
Thanks again for all your valuable contributions, we look forward to
more!
Arun
Hi all,
As far as I know, a barrier exists between map and reduce function in
one round of MR. There is another barrier for the reducer to end the
job for that round. However if we want to run in several rounds using
the same map and reduce functions, then the barrier between reduce and
the map of
Hi Amogh,
Thank you for the reply.
>>> What you need, I believe, is “just run on whatever map has”.
You got that right :). An example of sequential program would be
Bubble Sort which needs several iterations for the end result and in each
iteration it needs to work on the previ
Hi,
If each of your sequential iteration is map+reduce, then no.
The lifetime of a split is confined to a single map reduce job. The split is
actually a reference to data, which is used to schedule job as close as
possible to data. The record reader then uses same object to pass the in
split.
W
We have a hadoop job running and have used custom counters to track few
counters ( like no of successfully processed documents matching certain
conditions)
Since we need to get this counters even while the Hadoop job is running , we
wrote another Java program to read these counters
*
Counter re