You are using the in-memory recommender (not Hadoop version)? Note that this may not scale well.
The in-memory and Hadoop versions of the recommender *require* user and item IDs to be non-negative contiguous integers. You must map your IDs to Mahout-IDs and back again. Inside Mahout *only* Mahout-IDs are used. Not sure what you are asking about “indexes” BTW the new Spark-Mahout v1.0 snapshot version of the recommender has no such restriction on user and item IDs. See a description here: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html It is much easier to use with MongoDB especially if you index certain document fields with Solr, which it requires to deliver recommendations. On Feb 11, 2015, at 6:15 AM, 黄雅冠 <[email protected]> wrote: Hi ! I am using mahout item-based recommendation with mongodb. I play around with it and have serval questions. - How to persistent the recommend model from memory to disk? I know it is an old question and there already exists several discussions, such as this one <http://mail-archives.apache.org/mod_mbox/mahout-user/201112.mbox/%3ccanq80da42nfr8p5mt-qnbo-ycaxyfrbskyoefairdzyrdy-...@mail.gmail.com%3E> . The result come out I have to do it myself. I just wondering is there any realization after two years? - Is it better to set index in the collection ( the one provides preference data )? I read the source and find some query on the collection, such as (user_id, item_id), (user_id), (item_id). Also when refresh called, it will scan the whole collection to find the new data, so (create_at). Would I benefit from ensure index on the fields? If yes, which indexes should I ensure? - From what I can understand, I can use refreshData to achieve event driven fresh. That is, when an event ( user scores at an item), I can call refresh to refresh the model. And it is better on performance and the model keeps up to date. Am I right? Thanks! — hyg
