What happened is that we added fast start (HADOOP-3136), which
launches more than one task per a heartbeat. Previously, if you maps
didn't take very long, they finished before the heartbeat and the task
tracker was assigned a new map task. A side effect was that no reduce
tasks were launched until
Just to inform, we installed v.0.21.0-dev and there is no such issue now.
2009/3/6 Rasit OZDAS
> So, is there currently no solution to my problem?
> Should I live with it? Or do we have to have a JIRA for this?
> What do you think?
>
>
> 2009/3/4 Nick Cen
>
> Thanks, about the "Secondary Sort",
So, is there currently no solution to my problem?
Should I live with it? Or do we have to have a JIRA for this?
What do you think?
2009/3/4 Nick Cen
> Thanks, about the "Secondary Sort", can you provide some example. What does
> the intermediate keys stands for?
>
> Assume I have two mapper, m1
Thanks, about the "Secondary Sort", can you provide some example. What does
the intermediate keys stands for?
Assume I have two mapper, m1 and m2. The output of m1 is (k1,v1),(k2,v2) and
the output of m2 is (k1,v3),(k2,v4). Assume k1 and k2 belongs to the same
partition and k1 < k2, so i think the
The output of each map is sorted by partition and by key within that
partition. The reduce merges sorted map output assigned to its
partition into the reduce. The following may be helpful:
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html
If your job requires total order, consi
can you provide more info about sortint? The sort is happend on the whole
data set, or just on the specified partion?
2009/3/4 Mikhail Yakshin
> On Wed, Mar 4, 2009 at 2:09 AM, Chris Douglas wrote:
> > This is normal behavior. The Reducer is guaranteed to receive all the
> > results for its part
On Wed, Mar 4, 2009 at 2:09 AM, Chris Douglas wrote:
> This is normal behavior. The Reducer is guaranteed to receive all the
> results for its partition in sorted order. No reduce can start until all the
> maps are completed, since any running map could emit a result that would
> violate the order
This is normal behavior. The Reducer is guaranteed to receive all the
results for its partition in sorted order. No reduce can start until
all the maps are completed, since any running map could emit a result
that would violate the order for the results it currently has. -C
On Mar 1, 2009,
Strange, that I've last night tried 1 input files (maps), waiting time
after maps increases (probably linearly)
2009/3/2 Rasit OZDAS
> I have 6 reducers, Nick, still no luck..
>
> 2009/3/2 Nick Cen
>
> how many reducer do you have? You should make this value larger then 1 to
>> make mapper
I have 6 reducers, Nick, still no luck..
2009/3/2 Nick Cen
> how many reducer do you have? You should make this value larger then 1 to
> make mapper and reducer run concurrently. You can set this value from
> JobConf.*setNumReduceTasks*().
>
>
> 2009/3/2 Rasit OZDAS
>
> > Hi!
> >
> > Whatever c
how many reducer do you have? You should make this value larger then 1 to
make mapper and reducer run concurrently. You can set this value from
JobConf.*setNumReduceTasks*().
2009/3/2 Rasit OZDAS
> Hi!
>
> Whatever code I run on hadoop, reduce starts a few seconds after map
> finishes.
> And wo
Hi!
Whatever code I run on hadoop, reduce starts a few seconds after map
finishes.
And worse, when I run 10 jobs parallely (using threads and sending one after
another)
all maps finish sequentially, then after 8-10 seconds reduces start.
I use reducer also as combiner, my cluster has 6 machines, n
12 matches
Mail list logo