Hi all, I have a product recommendation use case for an e-commerce site and I've been playing around mahout's CF capabilities lately. I reached a point where I should ask a feedback from the community on my approach. I'm struggling to get ANY recommendation.
Here's what I've done so far: 1) Collected data in this format: USER_ID,PRODUCT_ID,VIEW,LIKE,FAVORITE,PURCHASE 0001,3333,1,0,1,1 2) Built preferences for each user based on a computed score [0,1] for the attributes (view,like,favorite,purchase) 3) Implemented a custom ItemSimilarity class to boost products in the same type, from the same manufacturer, with similar prices, etc 4) Implemented a custom IDRescorer to filter out products that are no longer available. 5) Implemented a Recommender subclass that simply instantiates a GenericItemRecommender with my custom similarity class: recommender = new GenericItemBasedRecommender(myModel, new ProductSimilarity()); It happens that <b>most</b> of product/preferences data is historical and most products are no longer available. Because of this I get zero recommendations when the IDRescorer's filter is activated (yes I checked the isFiltered method, it is returning true for expired products and false otherwise as expected) event though there are lots of valid products. Here's the overridden method in the IDRescorer's class: public boolean isFiltered(long id) { Product a = PreferencesDataModel.lookupProduct(id); return ! a.isActive(); // filter expired product } I see it as beneficial to feed the engine with historical data on expired products and filtering them for recommendation, but getting zero recommendations made me rethink this approach (I also tried different similarity metrics including UserSim). What do you guys think?