Re: Any general performance tips for job RowSimilarityJob-CooccurrencesMapper-SimilarityReducer?

Dan Brickley Tue, 18 Oct 2011 01:24:41 -0700

2011/10/18 Sebastian Schelter <[email protected]>:
> Hi Ramon,
>
> my first suggestion would be to use Mahout 0.6 as significant
> improvements have been made to RowSimilarityJob and the 0.5 version has
> known bugs.
>
> The runtime of RowSimilarityJob is not only determined by the size of
> the input but also by the distribution of the interactions among the
> users.


As an aside, I've notice this 'users' terminology lurking in the
background of RowSimilarityJob (eg. in JIRA discussion).

My use of it last week seemed perfectly reasonable; but rows were
books (or bibliographic records), with feature columns from library
topic codes. Does the 'user' terminology suggest it's really focussed
on recommendations?

I'm used to seeing this in the Taste part of Mahout, where sometimes
it's suggested we can re-use recommender pieces by eg. thinking more
broadly and 'recommending topics to books' or vice versa. This makes
sense but introduces an extra layer of conceptual confusion. Is there
any important sense in which rows (or columns?) in RowSimilarityJob
ought to be thought of as users? Or the values/weights as preferences?

cheers,

Dan

Re: Any general performance tips for job RowSimilarityJob-CooccurrencesMapper-SimilarityReducer?

Reply via email to