Ok, understood now :)

About the parameters:

It's a 50000x100 dense matrix, so I set the --numberOfColumns parameter to
100, and the rest nophave the default values (This means that
maxSimilaritiesPerRow is set to 100, but I don't know which 100 it will
return...)

2010/12/20 Sebastian Schelter <[email protected]>

> Hi,
>
> Most of mahout's algorithm implementations need to run a series of
> map/reduce jobs to compute their results. By specifying a start and endphase
> you can make the implementation run only some of these internal jobs. You
> could e.g. use this to restart a failed execution.
>
> --sebastian
>
>
>
> On 20.12.2010 12:41, Fernando Fernández wrote:
>
>> But, does this affect the result? What will I get if I launch
>> Rowsimiliarty
>> (cosine similarity) with --startphase=1 and --endPhase=2? I don't fully
>> understand what "phases" exactly are in this case.
>>
>> 2010/12/20 Niall Riddell<[email protected]>
>>
>>  Startphase and endphase shouldn't impact overall performance in any way,
>>> however it does mean that you can start at a later stage in a job
>>> pipeline.
>>>
>>> You can execute specific MR jobs by designating a startphase and
>>> endphase.
>>> It goes without saying that the correct inputs must be available to start
>>> a
>>> phase correctly.
>>>
>>> The first MR job is index 0.  So setting --startPhase 1 will execute the
>>> 2nd
>>> job onwards.  Putting in --endPhase 2 would stop after the 3rd job.
>>> On 20 Dec 2010 11:17, "Fernando Fernández"<
>>> [email protected]>  wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> Can anyone explain what are exactly these two parameters (startphase and
>>>> endphase) and how to use them? I'm trying to launch a RowSimilarity job
>>>>
>>> on
>>> a
>>>
>>>> 50K row matrix (100 columns) with cosine similarity and default
>>>>
>>> startphase
>>>
>>>> and endphase parameters and I'm getting a extremely poor performance on
>>>> a
>>>> quite big cluster (After 16 hours, only reached 3% of the proccess) and
>>>> I
>>>> think that this could have something to do with startphase and endphase
>>>> parameters. What do you think? How do these paremeters affect the
>>>> RowSimilarity job?
>>>>
>>>> Thanks in advance.
>>>> Fernando.
>>>>
>>>
>

Reply via email to