Re: Barrier between reduce and map of the next round

2010-02-03 Thread Amogh Vasekar
>>However, from ri to m(i+1) there is an unnecessary barrier. m(i+1) should not >>need to wait for all reducers ri to finish, right? Yes, but r(i+1) cant be in the same job, since that requires another sort and shuffle phase ( barrier ). So you would end up doing, job(i) : m(i)r(i)m(i+1) . Job

Re: avoiding data redistribution in iterative mapreduce

2010-02-03 Thread Amogh Vasekar
Hi, >>Will there be a re-assignment of Map & Reduce nodes by the Master? In general using available schedulers, I believe so. Because if it weren't, and I submit job 2 needing different/additional set of inputs, the data locality considerations would be somewhat hampered right? When we had HOD, t

Re: Barrier between reduce and map of the next round

2010-02-03 Thread Felix Halim
Hi Ed, Currently my program is like this: m1,r1, m2,r2, ..., mK, rK. The barrier between mi and ri is acceptable since reducer has to wait for all map task to finish. However, from ri to m(i+1) there is an unnecessary barrier. m(i+1) should not need to wait for all reducers ri to finish, right?

Re: Barrier between reduce and map of the next round

2010-02-03 Thread Ed Mazur
Felix, You can use ChainMapper and ChainReducer to create jobs of the form M+RM*. Is that what you're looking for? I'm not aware of anything that allows you to have multiple reduce functions without the job "barrier". Ed On Wed, Feb 3, 2010 at 9:41 PM, Felix Halim wrote: > Hi all, > > As far as

New Hadoop Map-Reduce Committer: Vinod Kumar V.

2010-02-03 Thread Arun C Murthy
The Hadoop PMC has voted to make Vinod Kumar V. a committer on Hadoop Map-Reduce. Congratulations, Vinod. Thanks again for all your valuable contributions, we look forward to more! Arun

Barrier between reduce and map of the next round

2010-02-03 Thread Felix Halim
Hi all, As far as I know, a barrier exists between map and reduce function in one round of MR. There is another barrier for the reducer to end the job for that round. However if we want to run in several rounds using the same map and reduce functions, then the barrier between reduce and the map of

Re: avoiding data redistribution in iterative mapreduce

2010-02-03 Thread Raghava Mutharaju
Hi Amogh, Thank you for the reply. >>> What you need, I believe, is “just run on whatever map has”. You got that right :). An example of sequential program would be Bubble Sort which needs several iterations for the end result and in each iteration it needs to work on the previ

Re: avoiding data redistribution in iterative mapreduce

2010-02-03 Thread Amogh Vasekar
Hi, If each of your sequential iteration is map+reduce, then no. The lifetime of a split is confined to a single map reduce job. The split is actually a reference to data, which is used to schedule job as close as possible to data. The record reader then uses same object to pass the in split. W

Reading Counters

2010-02-03 Thread Rajan Dev
We have a hadoop job running and have used custom counters to track few counters ( like no of successfully processed documents matching certain conditions) Since we need to get this counters even while the Hadoop job is running , we wrote another Java program to read these counters * Counter re