Evaluation of different recommendation algorithms for 12.000 user data set

Manuel Blechschmidt Mon, 21 Nov 2011 03:07:30 -0800

Hello Mahout Team, hello users,
me and a friend are currently evaluating recommendation techniques for 
personalizing a newsletter for a company selling tea, spices and some other 
products. Mahout is such a great product which saves me hours of time and 
millions of money because I want to give something back I write this small case 
study to the mailing list.


I am conducting an offline testing of which recommender is the most accurate 
one. Further I am interested in run time behavior like memory consumption and 
runtime.

The data contains implicit feedback. The preferences of the user is the amount 
in gramm that he bought from a certain product (453 g ~ 1 pound). If a certain 
product does not have this data it is replaced with 50. So basically I want 
mahout to predict how much of a certain product is a user buying next. This is 
also helpful for demand planing. I am currently not using any time data because 
I did not find a recommender which is using this data.

Users: 12858
Items: 5467
121304 preferences
MaxPreference: 85850.0 (Meaning that there is someone who ordered 85 kg of a 
certain tea or spice)
MinPreference: 50.0

Here are the pure benchmarks for accuracy in RMSE. They change during every run 
of the evaluation (~15%):

Evaluation of randomBased (baseline): 43045.380570443434 
(RandomRecommender(model)) (Time: ~0.3 s) (Memory: 16MB)
Evaluation of ItemBased with Pearson Correlation: 315.5804958647985 
(GenericItemBasedRecommender(model, PearsonCorrelationSimilarity(model)) (Time: 
~1s)  (Memory: 35MB)
Evaluation of ItemBase with uncentered Cosine: 198.25393235323375 
(GenericItemBasedRecommender(model, UncenteredCosineSimilarity(model))) (Time: 
~1s)  (Memory: 32MB)
Evaluation of ItemBase with log likelihood: 176.45243607278724 
(GenericItemBasedRecommender(model, LogLikelihoodSimilarity(model)))  (Time: 
~5s)  (Memory: 42MB)
Evaluation of UserBased 3 with Pearson Correlation: 1378.1188069379868 
(GenericUserBasedRecommender(model, NearestNUserNeighborhood(3, 
PearsonCorrelationSimilarity(model), model), 
PearsonCorrelationSimilarity(model)))  (Time: ~52s) (Memory: 57MB) 
Evaluation of UserBased 20 with Pearson Correlation: 1144.1905989614288 
(GenericUserBasedRecommender(model, NearestNUserNeighborhood(20, 
PearsonCorrelationSimilarity(model), model), 
PearsonCorrelationSimilarity(model)))  (Time: ~51s) (Memory: 57MB)
Evaluation of SlopeOne: 464.8989330869532 (SlopeOneRecommender(model)) (Time: 
~4s) (Memory: 604MB)
Evaluation of SVD based: 326.1050823499026 (ALSWRFactorizer(model, 100, 0.3, 
5)) (Time: ) (Memory: 691MB)

These were measured with the following method:

RecommenderEvaluator evaluator = new 
AverageAbsoluteDifferenceRecommenderEvaluator();
double evaluation = evaluator.evaluate(randomBased, null, myModel,
        0.9, 1.0);

Memory usage was about 50m with the item based case. Slope One and SVD base 
seams to use the most memory (615MB & 691MB).

The performance differs a lot. The fastest ones where the item based. They took 
about 1 to 5 seconds (PearsonCorrelationSimilarity and 
UncenteredCosineSimilarity 1 s, LogLikelihoodSimilarity 5s)
The user based where a lot slower.

Conclusion is that in my case the item based approach is the fastest, lowest 
memory consumption and most accurate one. Further I can use the 
recommendedBecause function.

Here is the spec of the computer:
2.3GHz Intel Core i5 (4 Cores). 1024 MB for java virtual machine. 

In the next step, probably in the next 2 month. I have to design a newsletter 
and send it to the customers. Then I can benchmark the user acceptance rate of 
the recommendations.

Any suggestions for enhancements are appreciated. If anybody is interested in 
the dataset or the evaluation code send me a private email. I might be able to 
convince the company to give out the dataset if the person is doing some 
interesting research.

/Manuel
-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

Evaluation of different recommendation algorithms for 12.000 user data set

Reply via email to