Hi Serega: We have also tried the mahout 0.9 RecommenderJob, and also found the the result is not good either. We are now debugging into the source code to find the possible issues. So how about the output of mahout 0.7? we will switch to this version if the result is acceptable, thanks.
Best Wei On Tue, Nov 4, 2014 at 8:00 PM, Serega Sheypak <serega.shey...@gmail.com> wrote: > Hi, i used org.apache.mahout.cf.taste.hadoop.item.RecommenderJob in mahout > 0.7 (CDH4) > Here are parameters: > numRecommendations=1000 > threshold=0.91 > maxSimilaritiesPerItem=1000 > maxPrefsPerUserInItemSimilarity=10 > similarityClassname=SIMILARITY_LOGLIKELIHOOD > > Then I migrated to 0.9 (CDH5) > I've found one difference: > maxPrefsPerUserInItemSimilarity renamed to maxPrefsInItemSimilarity > > The other thing is how it works. > I see this output in 0.7: > > USER_RATINGS_NEGLECTED=14954083 > > USER_RATINGS_USED=32355513 > > ===== > > COOCCURRENCES=72 503 210 > > PRUNED_COOCCURRENCES=0 > > > output in 0.9: > > NEGLECTED_OBSERVATIONS=39 175 989 > > ROWS=4 937 362 > > USED_OBSERVATIONS=10 840 138 > > ===== > > > org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters > COOCCURRENCES=17 645 029 > > PRUNED_COOCCURRENCES=0 > > > And 0.9 gives me awful result, just trash. > > I run over the same dataset > > mahout 0.7 is on old production CDH4 cluster, > > mahout 0.9 is on new CDH5 cluster. > > > > Why there is so huge difference? Is there any possibility to fix it? >