Re: RowSimilarity startphase and endphase parameters

Fernando Fernández Mon, 20 Dec 2010 03:42:17 -0800

But, does this affect the result? What will I get if I launch Rowsimiliarty
(cosine similarity) with --startphase=1 and --endPhase=2? I don't fully
understand what "phases" exactly are in this case.


2010/12/20 Niall Riddell <[email protected]>

> Startphase and endphase shouldn't impact overall performance in any way,
> however it does mean that you can start at a later stage in a job pipeline.
>
> You can execute specific MR jobs by designating a startphase and endphase.
> It goes without saying that the correct inputs must be available to start a
> phase correctly.
>
> The first MR job is index 0.  So setting --startPhase 1 will execute the
> 2nd
> job onwards.  Putting in --endPhase 2 would stop after the 3rd job.
> On 20 Dec 2010 11:17, "Fernando Fernández" <
> [email protected]> wrote:
> > Hello everyone,
> >
> > Can anyone explain what are exactly these two parameters (startphase and
> > endphase) and how to use them? I'm trying to launch a RowSimilarity job
> on
> a
> > 50K row matrix (100 columns) with cosine similarity and default
> startphase
> > and endphase parameters and I'm getting a extremely poor performance on a
> > quite big cluster (After 16 hours, only reached 3% of the proccess) and I
> > think that this could have something to do with startphase and endphase
> > parameters. What do you think? How do these paremeters affect the
> > RowSimilarity job?
> >
> > Thanks in advance.
> > Fernando.
>

Re: RowSimilarity startphase and endphase parameters

Reply via email to