So I've taken another try at using recommendations values. However, unlike something that a user is explicitly rating on a scale of 0-5. I am using a user's activity. Certain activities of a user toward an item are negative, and certain are positive.
If I have users 1 and 2 and 3, and product X, and their preferences are as follows: 1, X, -1 2, X, 1 3, X, 10 Clearly 2 and 3 are closer than 2 and 1, because they both like product X, just to varying degrees. However, most distance algorithms I've tried are incorrectly showing 1 and 2 closer because their difference is less. Am I approaching this wrong? Other than switching to boolean preferences, is there a better way to approach this? -Will On Mon, Apr 16, 2012 at 2:35 PM, Will C <[email protected]> wrote: > Thanks for clearing that up. > > > On Mon, Apr 16, 2012 at 2:02 PM, Sean Owen <[email protected]> wrote: > >> In the case of no ratings, the value you observe is *not* a predicted >> rating. After all, they are all 1.0 and so can't be used for ranking. >> The result is actually a sum of similarities, which is why it can be >> arbitrarily large. It is not supposed to be in [0,1] or anything like >> that. >> >> On Sun, Apr 15, 2012 at 5:47 PM, Will C <[email protected]> wrote: >> > I have a boolean input dataset, with user, item, and preference. Each >> > preference is a 1.0 if it exists. Based on this dataset I had used a >> > Tanimoto Similarity and tried both Boolean Pref User and Item >> Recommenders. >> > >> > >> > After reading Mahout in Action and several threads on stack overflow, I >> saw >> > that the LogLikelihood Similarity model was recommended for boolean >> dataset >> > recommenders. >> > >> > However, the scores I get for the recommended items using the >> LogLikelihood >> > similarity are sometimes much greater than 1.0, even though none of the >> > input scores are higher than that. I saw scores of 11.0 being returned >> for >> > some users' recommendations. >> > >> > This is making it very hard for me to use the scoring and estimation >> > functions. I have switched back to Tanimoto for now, but am I doing >> > something wrong, or am I incorrect in expecting the recommended scores >> and >> > estimated preferences to be in the 0-1.0 range for this dataset? >> > >
