Re: Is it better to set index when using recommend with mongodb?

2015-02-11 Thread 黄雅冠
Say I have collection A cotainning the data to be trained. Each doc is like {_id : ObjectId(...), userid : ObjectId(...),itemid : ObjectId(...),value:1,create_at:14123456 } Should I create indrx on fields other than the default _id? 2015年2月12日 上午3:52于 "Pat Ferrel" 写道: > Do you understand the req

Re: Is it better to set index when using recommend with mongodb?

2015-02-11 Thread Pat Ferrel
Do you understand the requirement for Mahout IDs? Still don’t understand your index question. Mongo *can* store the Mahout IDs and index them. In this case you would have a Mongo ObjectId, your own application specific ID (catalog number, username, etc), and the Mahout ID (0..n). You could loo

Re: Is it better to set index when using recommend with mongodb?

2015-02-11 Thread 黄雅冠
Thanks for the reply. Yes, I am using in memory version because of the learning curve for a biginner. The index I have memtion is mongo index on collection. Will it quicker if I ensure some index before trainning? I use maven to manage project. Does 1.0 accessiabe via maven. Does it a beta versi

Re: Is it better to set index when using recommend with mongodb?

2015-02-11 Thread Pat Ferrel
You are using the in-memory recommender (not Hadoop version)? Note that this may not scale well. The in-memory and Hadoop versions of the recommender *require* user and item IDs to be non-negative contiguous integers. You must map your IDs to Mahout-IDs and back again. Inside Mahout *only* Maho

Re: weighted ALS in Mahout

2015-02-11 Thread Pat Ferrel
1) This is a Hadoop mapreduce job so the speed is related to how many nodes you have in the cluster—increase them. 2) Runtime is also dependent on the size of your data. How many users and items? You set "numIterations 1-“ is that a typo? If #1 and #2 do not explain runtime try starting from the

weighted ALS in Mahout

2015-02-11 Thread Hartwig Anzt
Dear Mahout-Users, I would like to use the ALS implementation available in Mahout as reference in a performance evaluation. The challenge for me, as I have little knowledge about the Mahout implementation, is to ensure that the exact same setup is running. I want to obtain timings for the al

Is it better to set index when using recommend with mongodb?

2015-02-11 Thread 黄雅冠
Hi ! I am using mahout item-based recommendation with mongodb. I play around with it and have serval questions. - How to persistent the recommend model from memory to disk? I know it is an old question and there already exists several discussions, such as this one

Re: How can I manually specify user similarities in the user-based algorithm?

2015-02-11 Thread Juanjo Ramos
Yes. You approach sounds about right. As far as I know, you just cannot not pass a file to Mahout with user similarities and it will create a UserSimilarity object as it can do with the DataModel. When I have done something like that in the past, you need to build your own thing of parsing the fi

Re: How can I manually specify user similarities in the user-based algorithm?

2015-02-11 Thread Eugenio Tacchini
Yes, I know I can implement a custom user similarity but what I want to do is passing to mahout fixed, pre-computed user similarities I have already stored in a text file in the easiest way possible, since I am not a Java programmer. If there is no way to do it, I will implement CustomUserSimilari

Re: How can I manually specify user similarities in the user-based algorithm?

2015-02-11 Thread Juanjo Ramos
You can create your custom class with your similarity implementation. All you need is that class to implement the UserSimilarity interface and use it here UserSimilarity similarity = new PearsonCorrelationSimilarity(dm); instead of the PearsonCorrelationSimilarity. UserSimilarity similarity = new

Re: How can I manually specify user similarities in the user-based algorithm?

2015-02-11 Thread Eugenio Tacchini
Hello Pat and thanks for your reply, I know that when users >> items normally item-based works better and I don't assume my similarity metric works better but I have, for research purposes, to compare: - RMSE produced by a pearson correlation user-based algorithm VS - RMSE produced by a user-based