Startphase and endphase shouldn't impact overall performance in any way,
however it does mean that you can start at a later stage in a job pipeline.

You can execute specific MR jobs by designating a startphase and endphase.
It goes without saying that the correct inputs must be available to start a
phase correctly.

The first MR job is index 0.  So setting --startPhase 1 will execute the 2nd
job onwards.  Putting in --endPhase 2 would stop after the 3rd job.
On 20 Dec 2010 11:17, "Fernando Fernández" <
[email protected]> wrote:
> Hello everyone,
>
> Can anyone explain what are exactly these two parameters (startphase and
> endphase) and how to use them? I'm trying to launch a RowSimilarity job on
a
> 50K row matrix (100 columns) with cosine similarity and default startphase
> and endphase parameters and I'm getting a extremely poor performance on a
> quite big cluster (After 16 hours, only reached 3% of the proccess) and I
> think that this could have something to do with startphase and endphase
> parameters. What do you think? How do these paremeters affect the
> RowSimilarity job?
>
> Thanks in advance.
> Fernando.

Reply via email to