mahout 0.7 and 09. difference for org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

Serega Sheypak Tue, 04 Nov 2014 04:02:20 -0800

Hi, i used org.apache.mahout.cf.taste.hadoop.item.RecommenderJob in mahout
0.7 (CDH4)
Here are parameters:
numRecommendations=1000
threshold=0.91
maxSimilaritiesPerItem=1000
maxPrefsPerUserInItemSimilarity=10
similarityClassname=SIMILARITY_LOGLIKELIHOOD


Then I migrated to 0.9 (CDH5)
I've found one difference:
maxPrefsPerUserInItemSimilarity renamed to maxPrefsInItemSimilarity

The other thing is how it works.
I see this output in 0.7:

USER_RATINGS_NEGLECTED=14954083

USER_RATINGS_USED=32355513

=====

COOCCURRENCES=72 503 210

PRUNED_COOCCURRENCES=0


output in 0.9:

NEGLECTED_OBSERVATIONS=39 175 989

ROWS=4 937 362

USED_OBSERVATIONS=10 840 138

=====

org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
COOCCURRENCES=17 645 029

PRUNED_COOCCURRENCES=0


And 0.9 gives me awful result, just trash.

I run  over the same dataset

mahout 0.7 is on old production CDH4 cluster,

mahout 0.9 is on new CDH5 cluster.



Why there is so huge difference? Is there any possibility to fix it?

mahout 0.7 and 09. difference for org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

Reply via email to