Thanks for your very good explanation about ratings. I agree with your opinion. I am using MovieLens for test purposes. An application that offers cooccurrence recommender based recommendations must use actions of the users as a training set.
Regarding my question about the tags and the content based indicator I will try to explain my question better. Using the spark-rowsimilarity function we get a content type indicator. The first output (in the example) is "3459860b<tab>3459860b 3459860b 6749860c 5959860a 3434860a 3477860a" . This means that in the document of the "3459860b" item, we have to add a new field (I will call it "tags-indicator") with the content "3459860b 3459860b 6749860c 5959860a 3434860a 3477860a". And we have to do the same with all documents (returned by spark-rowsimilarity). Now we are ready to issue queries using this content based indicator. Am I right? When we want to issue queries we have to do the following. If our user have purchased the item 3459860b (continuing with the example), then we have to issue the following query: field: purchase; q: 3459860b field: tags-indicator; q: 3459860b Besides that, if we want our results skewed towards items with similar tags to the ones the user has already purchased (without using the content based indicator), we can issue the following query: field: purchase; q: 3459860b field: tags; q: men long-sleeve chambray clothing casual Is that ok? Or am I understanding anything wrong in using the content based indicator? Ferran Muñoz [email protected] 2015-02-27 2:16 GMT+01:00 Pat Ferrel <[email protected]>: > Long answer: > > Preferred tags is an example of an action that would not lead to > recommendations in any other type of recommender. A user takes many actions > in your app, not all of them have “purchase” intent behind them. What the > cross-cooccurrence stuff does is find actions that correlate with the > action you want to recommend. Don’t get too hung up in that before you > understand the basics—it is a way to make better use of your data. > > The cooccurrence recommender does not use ratings. In fact any Mahout > recommender that uses LLR ignores ratings. Ratings are very hard to use in > practice since no two people rate on the same scale and the same person is > often inconsistent about ratings. It is more important to find an indicator > or preference and focus on _ranking_ better. Ask yourself if you want to > predict a rating or show the user the things you think they will like in > the right order (you can only recommend a fixed number of things after > all). Not even Netflix, who led us into thinking ratings were important, > use ratings predictions to make recommendations anymore and they have > stated this publicly. > > Short Answer: > > Feed MovieLens in and you will get ranked ratings out of the system (it > requires a search engine to query—don’t forget). If you want to toss the > very low ratings the answers might be a little better but the fact that a > user cared enough to watch the movie is the important thing. > > > On Feb 26, 2015, at 12:08 AM, Ferran Muñoz <[email protected]> wrote: > > Hello, > > I have read the "Intro to Cooccurrence Recommenders with Spark" of the > Mahout documentation and I have a question regarding the unified > recommender query. What does "user's-tags-associated-with-purchases" > exactly mean? Does it mean that I have to put tags or itemids? > > I understand that the "tags" field of each item document contains the tags > of this particular item. Then, what query do it have to write in order to > get recommended items using the content-based indicator? > > On the other hand, how can I use ratings when computing > the spark-itemsimilarity? For example, how can I use spark-itemsimilarity > to get recommendations in MovieLens dataset (it has ratings, not boolean)? > > Thank you in advance. > > Ferran > >
