Hello all,
This email follows the correspondence in StackExchange between myself and
Sean Owen. Please see
http://stackoverflow.com/questions/8240383/apache-mahout-performance-issues

I'm building a boolean-based recommendation engine with the following data:

   - 12M users
   - 2M items
   - 18M user-item (boolean) choices

The following code is used to build the recommender:

DataModel dataModel = new FileDataModel(new File(dataFile));
ItemSimilarity itemSimilarity = new CachingItemSimilarity(new
LogLikelihoodSimilarity(dataModel), dataModel);
CandidateItemsStrategy candidateItemsStrategy = new
SamplingCandidateItemsStrategy(20, 5);
MostSimilarItemsCandidateItemsStrategy
mostSimilarItemsCandidateItemsStrategy = new
SamplingCandidateItemsStrategy(20, 5);

this.recommender = new GenericBooleanPrefItemBasedRecommender(dataModel,
itemSimilarity,
candidateItemsStrategy,mostSimilarItemsCandidateItemsStrategy);

My app runs on a Tomcat with the following JVM arguments:
*-Xms4096M -Xmx4096M -da -dsa -XX:NewRatio=19 -XX:+UseParallelGC
-XX:+UseParallelOldGC*

Recommendations with the code above works very well for users who have made
1-2 choices in the past, but can take over to a minute when a user had made
tens of choices, especially if one of these choices is a very popular item
(i.e. was chosen by many other users).

Even when using the *SamplingCandidateItemsStrategy* with (1,1) arguments,
I still did not manage to achieve fast results.

The only way I managed to get somewhat OK results (max recommendation time
~4 secs), was by rewriting the *SamplingCandidateItemsStrategy* in a way
that *doGetCandidateItems* returns a limited amount of items. Following is
the doGetCandidateItems method as I re-wrote it:
http://pastebin.com/6n9C8Pw1

**I think a good response time for recommendations should be less than a
second (preferably less than 500 milliseconds).**
How can I make Mahout perform better? I have a feeling some optimization is
needed both on the *CandidateItemsStrategy* and the *Recommender* itself.
*
*
Thanks in advance!
Daniel

Reply via email to