Re: Understanding similaraties computation in RecommenderJob

Stefano Bellasio Tue, 18 Jan 2011 03:02:03 -0800

Thank you Sebastian, just some questions to be sure of everything (im looking 
for RowSimilarityJob in my mahout installation (0.4) but without success, where 
can i find it?):
1) RowSimilarityJob so use a choosen similarity in the first step (for example 
Euclidean Distance) 
2) Then for each pair of items that has  a co-occurrence in the co-occurrence 
matrix it computes a similarity value with Euclidean Distance for example?


I'm not sure about that, thank you again

Il giorno 18/gen/2011, alle ore 11.32, Sebastian Schelter ha scritto:

> Hi Stefano,
> 
> AFAIK the chapter about distributed recommenders in Mahout in Action has not 
> yet been updated to the latest version of RecommenderJob maybe that's the 
> source of your confusion.
> 
> I'll try to give a brief explanation of the similarity computation, feel free 
> to ask more questions if things don't get clear.
> 
> RecommenderJob starts ItemSimilarityJob which creates an item x user matrix 
> from the preference data and uses RowSimilarityJob to compute the pairwise 
> similarities of the rows of this matrix (the items). So the best place to 
> start is looking at at RowSimilarityJob.
> 
> RowSimilarityJob uses an implementation of DistributedVectorSimilarity to 
> compute the similarities in two phases. In the first phase each item-vector 
> is shown to the similarity implementation and it can compute a "weight" for 
> it. In the second phase for all pairs of rows that have at least one 
> cooccurrence the method similarity(...) is called with the formerly computed 
> weights and a list of all cooccurring values. This generic approach allows us 
> to use different implementations of DistributedVectorSimilarity so we can 
> support a wide range of similarity functions.
> 
> A simplified version of this algorithm is also explained in the slides of a 
> talk I gave at the Hadoop Get Together, maybe that's helpful too: 
> http://www.slideshare.net/sscdotopen/mahoutcf
> 
> --sebastian
> 
> 
> 
> On 18.01.2011 11:12, Stefano Bellasio wrote:
>> Hi guys, im trying to understand how RecommenderJob works. Right now i was 
>> thinking that was necessary choosing a particular similarity class like 
>> Euclidean Distance and so on, so my algorithm could compute all similarities 
>> for each pair of items and produce recommendations. Reading Mahout in 
>> Action, "Distributing a Recommender" i have now some doubts about the 
>> correlation between similarities like Euclidean, LogLike, Cosine and the 
>> co-occurence matrix, as i see in RecommenderJob i can specify also 
>> "Co-occurrence" as a similarity class, so what's the correct way to compute 
>> similarities and how this happens with other similarities class and 
>> co-occurrence matrix/similarity. Thank you very much for your further 
>> explanations :)
>

Re: Understanding similaraties computation in RecommenderJob

Reply via email to