"Do you understand the requirement for Mahout IDs? It is *required* that you create user and item IDs to be non-negative contiguous integers. You must map your IDs to Mahout-IDs and back again. Inside Mahout *only* Mahout-IDs are used. "
You cannot use userid : ObjectId(…) for the Mahout IDs as stated above--but any data type can be indexed in Mongo so they can be stored and indexed. MahoutID is a non-negative integer and correspond to row numbers (userid) and column numbers (itemid) of the matrix of all input. On Feb 11, 2015, at 12:09 PM, 黄雅冠 <[email protected]> wrote: Say I have collection A cotainning the data to be trained. Each doc is like {_id : ObjectId(...), userid : ObjectId(...),itemid : ObjectId(...),value:1,create_at:14123456 } Should I create indrx on fields other than the default _id? 2015年2月12日 上午3:52于 "Pat Ferrel" <[email protected]>写道: > Do you understand the requirement for Mahout IDs? > > Still don’t understand your index question. Mongo *can* store the Mahout > IDs and index them. In this case you would have a Mongo ObjectId, your own > application specific ID (catalog number, username, etc), and the Mahout ID > (0..n). You could lookup by any of these. > > On Feb 11, 2015, at 11:30 AM, 黄雅冠 <[email protected]> wrote: > > Thanks for the reply. > > Yes, I am using in memory version because of the learning curve for a > biginner. > > The index I have memtion is mongo index on collection. Will it quicker if I > ensure some index before trainning? > > I use maven to manage project. Does 1.0 accessiabe via maven. Does it a > beta version or a stable one? > 2015年2月12日 上午3:05于 "Pat Ferrel" <[email protected]>写道: > >> You are using the in-memory recommender (not Hadoop version)? Note that >> this may not scale well. >> >> The in-memory and Hadoop versions of the recommender *require* user and >> item IDs to be non-negative contiguous integers. You must map your IDs to >> Mahout-IDs and back again. Inside Mahout *only* Mahout-IDs are used. >> >> Not sure what you are asking about “indexes” >> >> BTW the new Spark-Mahout v1.0 snapshot version of the recommender has no >> such restriction on user and item IDs. See a description here: >> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html >> It is much easier to use with MongoDB especially if you index certain >> document fields with Solr, which it requires to deliver recommendations. >> >> On Feb 11, 2015, at 6:15 AM, 黄雅冠 <[email protected]> wrote: >> >> Hi ! >> >> I am using mahout item-based recommendation with mongodb. I play around >> with it and have serval questions. >> >> - >> >> How to persistent the recommend model from memory to disk? I know it is >> an old question and there already exists several discussions, such as >> this >> one >> < >> > http://mail-archives.apache.org/mod_mbox/mahout-user/201112.mbox/%3ccanq80da42nfr8p5mt-qnbo-ycaxyfrbskyoefairdzyrdy-...@mail.gmail.com%3E >>> >> . >> The result come out I have to do it myself. I just wondering is there > any >> realization after two years? >> - >> >> Is it better to set index in the collection ( the one provides >> preference data )? I read the source and find some query on the >> collection, >> such as (user_id, item_id), (user_id), (item_id). Also when refresh >> called, it will scan the whole collection to find the new data, so >> (create_at). Would I benefit from ensure index on the fields? If yes, >> which indexes should I ensure? >> - >> >> From what I can understand, I can use refreshData to achieve event >> driven fresh. That is, when an event ( user scores at an item), I can >> call >> refresh to refresh the model. And it is better on performance and the >> model >> keeps up to date. Am I right? >> >> Thanks! >> >> — hyg >> >> > >
