But, does this affect the result? What will I get if I launch Rowsimiliarty (cosine similarity) with --startphase=1 and --endPhase=2? I don't fully understand what "phases" exactly are in this case.
2010/12/20 Niall Riddell <[email protected]> > Startphase and endphase shouldn't impact overall performance in any way, > however it does mean that you can start at a later stage in a job pipeline. > > You can execute specific MR jobs by designating a startphase and endphase. > It goes without saying that the correct inputs must be available to start a > phase correctly. > > The first MR job is index 0. So setting --startPhase 1 will execute the > 2nd > job onwards. Putting in --endPhase 2 would stop after the 3rd job. > On 20 Dec 2010 11:17, "Fernando Fernández" < > [email protected]> wrote: > > Hello everyone, > > > > Can anyone explain what are exactly these two parameters (startphase and > > endphase) and how to use them? I'm trying to launch a RowSimilarity job > on > a > > 50K row matrix (100 columns) with cosine similarity and default > startphase > > and endphase parameters and I'm getting a extremely poor performance on a > > quite big cluster (After 16 hours, only reached 3% of the proccess) and I > > think that this could have something to do with startphase and endphase > > parameters. What do you think? How do these paremeters affect the > > RowSimilarity job? > > > > Thanks in advance. > > Fernando. >
