Startphase and endphase shouldn't impact overall performance in any way, however it does mean that you can start at a later stage in a job pipeline.
You can execute specific MR jobs by designating a startphase and endphase. It goes without saying that the correct inputs must be available to start a phase correctly. The first MR job is index 0. So setting --startPhase 1 will execute the 2nd job onwards. Putting in --endPhase 2 would stop after the 3rd job. On 20 Dec 2010 11:17, "Fernando Fernández" < [email protected]> wrote: > Hello everyone, > > Can anyone explain what are exactly these two parameters (startphase and > endphase) and how to use them? I'm trying to launch a RowSimilarity job on a > 50K row matrix (100 columns) with cosine similarity and default startphase > and endphase parameters and I'm getting a extremely poor performance on a > quite big cluster (After 16 hours, only reached 3% of the proccess) and I > think that this could have something to do with startphase and endphase > parameters. What do you think? How do these paremeters affect the > RowSimilarity job? > > Thanks in advance. > Fernando.
