Ok, understood now :) About the parameters:
It's a 50000x100 dense matrix, so I set the --numberOfColumns parameter to 100, and the rest nophave the default values (This means that maxSimilaritiesPerRow is set to 100, but I don't know which 100 it will return...) 2010/12/20 Sebastian Schelter <[email protected]> > Hi, > > Most of mahout's algorithm implementations need to run a series of > map/reduce jobs to compute their results. By specifying a start and endphase > you can make the implementation run only some of these internal jobs. You > could e.g. use this to restart a failed execution. > > --sebastian > > > > On 20.12.2010 12:41, Fernando Fernández wrote: > >> But, does this affect the result? What will I get if I launch >> Rowsimiliarty >> (cosine similarity) with --startphase=1 and --endPhase=2? I don't fully >> understand what "phases" exactly are in this case. >> >> 2010/12/20 Niall Riddell<[email protected]> >> >> Startphase and endphase shouldn't impact overall performance in any way, >>> however it does mean that you can start at a later stage in a job >>> pipeline. >>> >>> You can execute specific MR jobs by designating a startphase and >>> endphase. >>> It goes without saying that the correct inputs must be available to start >>> a >>> phase correctly. >>> >>> The first MR job is index 0. So setting --startPhase 1 will execute the >>> 2nd >>> job onwards. Putting in --endPhase 2 would stop after the 3rd job. >>> On 20 Dec 2010 11:17, "Fernando Fernández"< >>> [email protected]> wrote: >>> >>>> Hello everyone, >>>> >>>> Can anyone explain what are exactly these two parameters (startphase and >>>> endphase) and how to use them? I'm trying to launch a RowSimilarity job >>>> >>> on >>> a >>> >>>> 50K row matrix (100 columns) with cosine similarity and default >>>> >>> startphase >>> >>>> and endphase parameters and I'm getting a extremely poor performance on >>>> a >>>> quite big cluster (After 16 hours, only reached 3% of the proccess) and >>>> I >>>> think that this could have something to do with startphase and endphase >>>> parameters. What do you think? How do these paremeters affect the >>>> RowSimilarity job? >>>> >>>> Thanks in advance. >>>> Fernando. >>>> >>> >
