Hi, i used org.apache.mahout.cf.taste.hadoop.item.RecommenderJob in mahout
0.7 (CDH4)
Here are parameters:
numRecommendations=1000
threshold=0.91
maxSimilaritiesPerItem=1000
maxPrefsPerUserInItemSimilarity=10
similarityClassname=SIMILARITY_LOGLIKELIHOOD

Then I migrated to 0.9 (CDH5)
I've found one difference:
maxPrefsPerUserInItemSimilarity renamed to maxPrefsInItemSimilarity

The other thing is how it works.
I see this output in 0.7:

USER_RATINGS_NEGLECTED=14954083

USER_RATINGS_USED=32355513

=====

COOCCURRENCES=72 503 210

PRUNED_COOCCURRENCES=0


output in 0.9:

NEGLECTED_OBSERVATIONS=39 175 989

ROWS=4 937 362

USED_OBSERVATIONS=10 840 138

=====

org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
COOCCURRENCES=17 645 029

PRUNED_COOCCURRENCES=0


And 0.9 gives me awful result, just trash.

I run  over the same dataset

mahout 0.7 is on old production CDH4 cluster,

mahout 0.9 is on new CDH5 cluster.



Why there is so huge difference? Is there any possibility to fix it?

Reply via email to