Re: Reduce doesn't start until map finishes

2009-03-24 Thread Owen O'Malley
What happened is that we added fast start (HADOOP-3136), which launches more than one task per a heartbeat. Previously, if you maps didn't take very long, they finished before the heartbeat and the task tracker was assigned a new map task. A side effect was that no reduce tasks were launched until

Re: Reduce doesn't start until map finishes

2009-03-24 Thread Rasit OZDAS
Just to inform, we installed v.0.21.0-dev and there is no such issue now. 2009/3/6 Rasit OZDAS > So, is there currently no solution to my problem? > Should I live with it? Or do we have to have a JIRA for this? > What do you think? > > > 2009/3/4 Nick Cen > > Thanks, about the "Secondary Sort",

Re: Reduce doesn't start until map finishes

2009-03-05 Thread Rasit OZDAS
So, is there currently no solution to my problem? Should I live with it? Or do we have to have a JIRA for this? What do you think? 2009/3/4 Nick Cen > Thanks, about the "Secondary Sort", can you provide some example. What does > the intermediate keys stands for? > > Assume I have two mapper, m1

Re: Reduce doesn't start until map finishes

2009-03-03 Thread Nick Cen
Thanks, about the "Secondary Sort", can you provide some example. What does the intermediate keys stands for? Assume I have two mapper, m1 and m2. The output of m1 is (k1,v1),(k2,v2) and the output of m2 is (k1,v3),(k2,v4). Assume k1 and k2 belongs to the same partition and k1 < k2, so i think the

Re: Reduce doesn't start until map finishes

2009-03-03 Thread Chris Douglas
The output of each map is sorted by partition and by key within that partition. The reduce merges sorted map output assigned to its partition into the reduce. The following may be helpful: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html If your job requires total order, consi

Re: Reduce doesn't start until map finishes

2009-03-03 Thread Nick Cen
can you provide more info about sortint? The sort is happend on the whole data set, or just on the specified partion? 2009/3/4 Mikhail Yakshin > On Wed, Mar 4, 2009 at 2:09 AM, Chris Douglas wrote: > > This is normal behavior. The Reducer is guaranteed to receive all the > > results for its part

Re: Reduce doesn't start until map finishes

2009-03-03 Thread Mikhail Yakshin
On Wed, Mar 4, 2009 at 2:09 AM, Chris Douglas wrote: > This is normal behavior. The Reducer is guaranteed to receive all the > results for its partition in sorted order. No reduce can start until all the > maps are completed, since any running map could emit a result that would > violate the order

Re: Reduce doesn't start until map finishes

2009-03-03 Thread Chris Douglas
This is normal behavior. The Reducer is guaranteed to receive all the results for its partition in sorted order. No reduce can start until all the maps are completed, since any running map could emit a result that would violate the order for the results it currently has. -C On Mar 1, 2009,

Re: Reduce doesn't start until map finishes

2009-03-01 Thread Rasit OZDAS
Strange, that I've last night tried 1 input files (maps), waiting time after maps increases (probably linearly) 2009/3/2 Rasit OZDAS > I have 6 reducers, Nick, still no luck.. > > 2009/3/2 Nick Cen > > how many reducer do you have? You should make this value larger then 1 to >> make mapper

Re: Reduce doesn't start until map finishes

2009-03-01 Thread Rasit OZDAS
I have 6 reducers, Nick, still no luck.. 2009/3/2 Nick Cen > how many reducer do you have? You should make this value larger then 1 to > make mapper and reducer run concurrently. You can set this value from > JobConf.*setNumReduceTasks*(). > > > 2009/3/2 Rasit OZDAS > > > Hi! > > > > Whatever c

Re: Reduce doesn't start until map finishes

2009-03-01 Thread Nick Cen
how many reducer do you have? You should make this value larger then 1 to make mapper and reducer run concurrently. You can set this value from JobConf.*setNumReduceTasks*(). 2009/3/2 Rasit OZDAS > Hi! > > Whatever code I run on hadoop, reduce starts a few seconds after map > finishes. > And wo

Reduce doesn't start until map finishes

2009-03-01 Thread Rasit OZDAS
Hi! Whatever code I run on hadoop, reduce starts a few seconds after map finishes. And worse, when I run 10 jobs parallely (using threads and sending one after another) all maps finish sequentially, then after 8-10 seconds reduces start. I use reducer also as combiner, my cluster has 6 machines, n