Hi all, I am new to Mahout and I am putting up a Recommender for buddycloud ( http://buddycloud.com/) as a part of my GSoC project ( https://github.com/buddycloud/channel-directory). In the testing snapshot, I got ~100k users, ~20k items and ~230k boolean taste preferences. At first I tried an UserBasedRecommender, with an all-in-memory DataModel (read from dump file, created a GenericDataModel). The recommendations performed great, almost real time. However, I thought this strategy wouldn't scale, once the number of users and items tend to increase, and then the service could run out-of-memory.
Then I tried a PostgreSQLBooleanPrefJDBCDataModel, and, as expected, the performance dropped drastically. After reading the blog post at http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/, I decided to try an ItemBasedRecommender, using a preprocessed ItemSimilarity table. I am trying to not use MapReduce at first, thus I tried to compute the LogLikehood similarity from every pair of item. This took too long, and then I gave up. Finally, my questions are: Am I doing things right? What is the best way to compute item similarity offline without MapReduce? Thanks in advance! Abmar -- Abmar Barros MSc candidate on Computer Science at Federal University of Campina Grande - www.ufcg.edu.br OurGrid Team Member - www.ourgrid.org ParaĆba - Brazil