Thank you Sebastian, just some questions to be sure of everything (im looking for RowSimilarityJob in my mahout installation (0.4) but without success, where can i find it?): 1) RowSimilarityJob so use a choosen similarity in the first step (for example Euclidean Distance) 2) Then for each pair of items that has a co-occurrence in the co-occurrence matrix it computes a similarity value with Euclidean Distance for example?
I'm not sure about that, thank you again Il giorno 18/gen/2011, alle ore 11.32, Sebastian Schelter ha scritto: > Hi Stefano, > > AFAIK the chapter about distributed recommenders in Mahout in Action has not > yet been updated to the latest version of RecommenderJob maybe that's the > source of your confusion. > > I'll try to give a brief explanation of the similarity computation, feel free > to ask more questions if things don't get clear. > > RecommenderJob starts ItemSimilarityJob which creates an item x user matrix > from the preference data and uses RowSimilarityJob to compute the pairwise > similarities of the rows of this matrix (the items). So the best place to > start is looking at at RowSimilarityJob. > > RowSimilarityJob uses an implementation of DistributedVectorSimilarity to > compute the similarities in two phases. In the first phase each item-vector > is shown to the similarity implementation and it can compute a "weight" for > it. In the second phase for all pairs of rows that have at least one > cooccurrence the method similarity(...) is called with the formerly computed > weights and a list of all cooccurring values. This generic approach allows us > to use different implementations of DistributedVectorSimilarity so we can > support a wide range of similarity functions. > > A simplified version of this algorithm is also explained in the slides of a > talk I gave at the Hadoop Get Together, maybe that's helpful too: > http://www.slideshare.net/sscdotopen/mahoutcf > > --sebastian > > > > On 18.01.2011 11:12, Stefano Bellasio wrote: >> Hi guys, im trying to understand how RecommenderJob works. Right now i was >> thinking that was necessary choosing a particular similarity class like >> Euclidean Distance and so on, so my algorithm could compute all similarities >> for each pair of items and produce recommendations. Reading Mahout in >> Action, "Distributing a Recommender" i have now some doubts about the >> correlation between similarities like Euclidean, LogLike, Cosine and the >> co-occurence matrix, as i see in RecommenderJob i can specify also >> "Co-occurrence" as a similarity class, so what's the correct way to compute >> similarities and how this happens with other similarities class and >> co-occurrence matrix/similarity. Thank you very much for your further >> explanations :) >