2011/10/18 Sebastian Schelter <[email protected]>: > Hi Ramon, > > my first suggestion would be to use Mahout 0.6 as significant > improvements have been made to RowSimilarityJob and the 0.5 version has > known bugs. > > The runtime of RowSimilarityJob is not only determined by the size of > the input but also by the distribution of the interactions among the > users.
As an aside, I've notice this 'users' terminology lurking in the background of RowSimilarityJob (eg. in JIRA discussion). My use of it last week seemed perfectly reasonable; but rows were books (or bibliographic records), with feature columns from library topic codes. Does the 'user' terminology suggest it's really focussed on recommendations? I'm used to seeing this in the Taste part of Mahout, where sometimes it's suggested we can re-use recommender pieces by eg. thinking more broadly and 'recommending topics to books' or vice versa. This makes sense but introduces an extra layer of conceptual confusion. Is there any important sense in which rows (or columns?) in RowSimilarityJob ought to be thought of as users? Or the values/weights as preferences? cheers, Dan
