RowSimilarityJob computes the top-k similar rows to each row of the
input matrix. You can think of it as computing A'A and sparsifying the
result afterwards. Furthermore it allows to plug in a similarity measure
of your choice.

If you want to have a cooccurrence matrix, you can use
o.a.m.math.hadoop.similarity.cooccurrence.measures.CooccurrenceCountSimilarity
as similarity measure.


On 02.04.2013 23:43, Pat Ferrel wrote:
> Taking an idea from Ted, I'm working on a cross recommender starting from 
> mahout's m/r implementation of an item-based recommender. We have purchases 
> and views for items by user. It is straightforward to create a recommender on 
> purchases but using views as a predictor of purchases does not work so 
> well--giving us lower precision scores. This is, no doubt, because the events 
> have a lot of noise, views that do not lead to purchases.
> 
> To help solve this Ted suggests we think of a recommender in two parts:
> 
> [B'B]h_p = r_p  <== standard item-based recommender using purchases
> [B'A]h_v = r_v  <== cross-recommender using views and purchases
> r = r_p + r_v   <== linear combination of the two parts is the full 
> recommendation vector
> 
> These both make recommendations for purchases but method 2 makes cross 
> recommendations based on views. [B'A] is the co-occurrence matrix of views 
> with purchases. 
> 
> From RecommenderJob the 'similarity matrix' is created by:
> 
>   //calculate the co-occurrence matrix
>       ToolRunner.run(getConf(), new RowSimilarityJob(), new String[]{
>           "--input", new Path(prepPath, 
> PreparePreferenceMatrixJob.RATING_MATRIX).toString(),
>           "--output", similarityMatrixPath.toString(),
>           "--similarityClassname", similarityClassname,
>       …
> 
> What is the role of RowSimilarityJob here and how does it lead to a 
> co-occurrence matrix? I understand that in the general recommender the 
> co-occurrence matrix is symmetric so columns = rows. Is the co-occurrence 
> matrix actually calculated anywhere in the standard recommender?
> 
> The output of PreparePreferenceMatrixJob is a DistributedRowMatrix. As a 
> first cut it seems I can do the cross recommender part of the work by:
> 
>   //calculate the 'cross' co-occurrence matrix
>       B = PreparePreferenceMatrixJob using user purchase prefs
>       A = PreparePreferenceMatrixJob using user view prefs
>       // note that users and items must be the same for A and B, their ids 
> must map to the same things
>       B' = TransposeJob on B
>       [B'A] = MatrixMultJob on B', A
>       [B'A]h_v by using the partial multiply process in the standard 
> Recommender 
>       extract the needed recs
> 
> Questions:
>  *  I need to get item similarities perhaps even more importantly than user 
> history based recs. I use the [B'A] columns for this, right? Shouldn't I run 
> RowSimilarityJob on [B'A]'?
>  *  There are assumptions in some code that the co-occurrence matrix is 
> symmetric and so rows = columns. This is not true of the 'cross' 
> co-occurrence matrix. Are there places I need to account for this?
> 

Reply via email to