I am trying out Mahout to come up with product recommendations for users
based on data that show what products they use today.
The data is not web-scale, just about 300,000 users and 7 products. Few
comments about the data here:
1. Since users either have or not have a particular product, the value in
the matrix is either "1" or "0" for all the columns (rows being the userids)
2. All the users have one basic product, so I discounted this from the
data-model passed to the Mahout recommender since I assume that if everyone
has the same product, its effect on the recommendations are trivial.
3. The matrix itself is sparse, the total counts of users having each
product is :
A=31847, 54754,1897 |    23154 |    2201 |    2766 |    33585

Steps followed:
1. Created a data-source from the user-product table in the database
        File ratingsFile = new
File("datasets/products.csv");
        DataModel model = new FileDataModel(ratingsFile);
  2.  Created a recommender on this data
        CachingRecommender recommender = new CachingRecommender(new
SlopeOneRecommender(model));
3. Loop through all users and get the top ten recommendations:
        List<RecommendedItem> recommendations =
recommender.recommend(userId, 10);

Issue faced:
The problem I am facing is that the recommendations that come out are way
too simple - meaning that all that it seems like what is being recommended
is "if a user does not have product A, then recommend it, if they dont have
product B, then recommend it and so on." Basically a simple inverse of
their ownership status.

Obviously, I am not doing something right here. How can I do the modeling
better to get the right recommendations. Or is it that my dataset (300000
users times 7 products) is too small for Mahout to work with?

Look forward to your comments. Thanks.

Reply via email to