-ui has to offer it doesn't take
long to learn how to skim it and get 10x more accurate reading on your job
progress.
Matt
-Original Message-
From: Arun C Murthy [mailto:a...@hortonworks.com]
Sent: Sunday, September 18, 2011 11:27 PM
To: common-user@hadoop.apache.org
Subject: Re: p
Hi Chen,
yes, it saves time to move map() output to the nodes where they will be needed
for the reduce() input. After map() has processed the first blocks, it makes
sense to copy that output to the reduce nodes. Imagine a very large map()
output. If shuffle© would be postponed after all map nod
Or we can just seperate shuffle from reduce stage and integrate it to the
map stage
. Then we can clearly differentiate the map stage(before shuffle finish) and
(after shuffle finish)the reduce stage.
On Mon, Sep 19, 2011 at 1:20 AM, He Chen wrote:
> Hi Kai
>
> Thank you for the reply.
>
> Th
Hi Kai
Thank you for the reply.
The reduce() will not start because the shuffle phase does not finish. And
the shuffle phase will not finish untill alll mapper end.
I am curious about the design purpose about overlapping the map and reduce
stage. Was this only for saving shuffling time? Or the
Hi Chen,
the times when nodes running instances of the map and reduce nodes overlap. But
map() and reduce() execution will not.
reduce nodes will start copying data from map nodes, that's the shuffle phase.
And the map nodes are still running during that copy phase. My observation had
been tha
Hi Arun
I have a question. Do you know what is the reason that hadoop allows the map
and the reduce stage overlap? Or anyone knows about it. Thank you in
advance.
Chen
On Sun, Sep 18, 2011 at 11:17 PM, Arun C Murthy wrote:
> Nan,
>
> The 'phase' is implicitly understood by the 'progress' (val
Hi, Arun ,
Thanks!
As you explained, in the hadoop, we cannot explicitly divide job as two
phase, map and reduce, but only for reduce task, we can judge which stage
it's in, (shuffle, sort, reduce) (with 0.23 , we can also do it with
mappers, )
right?
Nan
On Mon, Sep 19, 2011 at 12:17 PM, Aru
Agreed.
At least, I believe the new web-ui for MRv2 is (or will be soon) more verbose
about this.
On Sep 18, 2011, at 9:23 PM, Kai Voigt wrote:
> Hi,
>
> this 0-33-66-100% phases are really confusing to beginners. We see that in
> our training classes. The output should be more verbose, such
Hi,
this 0-33-66-100% phases are really confusing to beginners. We see that in our
training classes. The output should be more verbose, such as breaking down the
phases into seperate progress numbers.
Does that make sense?
Am 19.09.2011 um 06:17 schrieb Arun C Murthy:
> Nan,
>
> The 'phase'
Nan,
The 'phase' is implicitly understood by the 'progress' (value) made by the
map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).
For e.g.
Reduce:
0-33% -> Shuffle
34-66% -> Sort (actually, just 'merge', there is no sort in the reduce since
all map-outputs are sorted)
67-100% -> Red
Hi Nan
I have the same question for a while. In some research papers, people like
to make the reduce stage to be slow start. In this way, the map stage and
reduce stage are easy to differentiate. You can use the number of remaining
unallocated map tasks to detect in which stage your job is.
To le
Hi, all
recently, I was hit by a question, "how is a hadoop job divided into 2
phases?",
In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
map and reduce, and for reduce, we further divided it into 3 stages,
shuffle, sort, and reduce, but in hadoop codes, I never think
12 matches
Mail list logo