ItemSimilarity pre-processing

Abmar Barros Tue, 12 Jul 2011 08:33:24 -0700

Hi all,

I am new to Mahout and I am putting up a Recommender for buddycloud (
http://buddycloud.com/) as a part of my GSoC project (
https://github.com/buddycloud/channel-directory).
In the testing snapshot, I got ~100k users, ~20k items and ~230k boolean
taste preferences.
At first I tried an UserBasedRecommender, with an all-in-memory DataModel
(read from dump file, created a GenericDataModel). The recommendations
performed great, almost real time. However, I thought this strategy wouldn't
scale, once the number of users and items tend to increase, and then the
service could run out-of-memory.


Then I tried a PostgreSQLBooleanPrefJDBCDataModel, and, as expected, the
performance dropped drastically. After reading the blog post at
http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/,
I decided to try an ItemBasedRecommender, using a preprocessed
ItemSimilarity table. I am trying to not use MapReduce at first, thus I
tried to compute the LogLikehood similarity from every pair of item. This
took too long, and then I gave up.

Finally, my questions are: Am I doing things right? What is the best way to
compute item similarity offline without MapReduce?

Thanks in advance!
Abmar

-- 
Abmar Barros
MSc candidate on Computer Science at Federal University of Campina Grande -
www.ufcg.edu.br
OurGrid Team Member - www.ourgrid.org
Paraíba - Brazil

ItemSimilarity pre-processing

Reply via email to