Hello all, This email follows the correspondence in StackExchange between myself and Sean Owen. Please see http://stackoverflow.com/questions/8240383/apache-mahout-performance-issues
I'm building a boolean-based recommendation engine with the following data: - 12M users - 2M items - 18M user-item (boolean) choices The following code is used to build the recommender: DataModel dataModel = new FileDataModel(new File(dataFile)); ItemSimilarity itemSimilarity = new CachingItemSimilarity(new LogLikelihoodSimilarity(dataModel), dataModel); CandidateItemsStrategy candidateItemsStrategy = new SamplingCandidateItemsStrategy(20, 5); MostSimilarItemsCandidateItemsStrategy mostSimilarItemsCandidateItemsStrategy = new SamplingCandidateItemsStrategy(20, 5); this.recommender = new GenericBooleanPrefItemBasedRecommender(dataModel, itemSimilarity, candidateItemsStrategy,mostSimilarItemsCandidateItemsStrategy); My app runs on a Tomcat with the following JVM arguments: *-Xms4096M -Xmx4096M -da -dsa -XX:NewRatio=19 -XX:+UseParallelGC -XX:+UseParallelOldGC* Recommendations with the code above works very well for users who have made 1-2 choices in the past, but can take over to a minute when a user had made tens of choices, especially if one of these choices is a very popular item (i.e. was chosen by many other users). Even when using the *SamplingCandidateItemsStrategy* with (1,1) arguments, I still did not manage to achieve fast results. The only way I managed to get somewhat OK results (max recommendation time ~4 secs), was by rewriting the *SamplingCandidateItemsStrategy* in a way that *doGetCandidateItems* returns a limited amount of items. Following is the doGetCandidateItems method as I re-wrote it: http://pastebin.com/6n9C8Pw1 **I think a good response time for recommendations should be less than a second (preferably less than 500 milliseconds).** How can I make Mahout perform better? I have a feeling some optimization is needed both on the *CandidateItemsStrategy* and the *Recommender* itself. * * Thanks in advance! Daniel
