So I've taken another try at using recommendations values.  However, unlike
something that a user is explicitly rating on a scale of 0-5. I am using a
user's activity.  Certain activities of a user toward an item are negative,
and certain are positive.

If I have users 1 and 2 and 3, and product X, and their preferences are as
follows:

1, X, -1
2, X, 1
3, X, 10

Clearly 2 and 3 are closer than 2 and 1, because they both like product X,
just to varying degrees.  However, most distance algorithms I've tried are
incorrectly showing 1 and 2 closer because their difference is less.

Am I approaching this wrong?  Other than switching to boolean preferences,
is there a better way to approach this?

-Will

On Mon, Apr 16, 2012 at 2:35 PM, Will C <[email protected]> wrote:

> Thanks for clearing that up.
>
>
> On Mon, Apr 16, 2012 at 2:02 PM, Sean Owen <[email protected]> wrote:
>
>> In the case of no ratings, the value you observe is *not* a predicted
>> rating. After all, they are all 1.0 and so can't be used for ranking.
>> The result is actually a sum of similarities, which is why it can be
>> arbitrarily large. It is not supposed to be in [0,1] or anything like
>> that.
>>
>> On Sun, Apr 15, 2012 at 5:47 PM, Will C <[email protected]> wrote:
>> > I have a boolean input dataset, with user, item, and preference.  Each
>> > preference is a 1.0 if it exists.  Based on this dataset I had used a
>> > Tanimoto Similarity and tried both Boolean Pref User and Item
>> Recommenders.
>> >
>> >
>> > After reading Mahout in Action and several threads on stack overflow, I
>> saw
>> > that the LogLikelihood Similarity model was recommended for boolean
>> dataset
>> > recommenders.
>> >
>> > However, the scores I get for the recommended items using the
>> LogLikelihood
>> > similarity are sometimes much greater than 1.0, even though none of the
>> > input scores are higher than that.  I saw scores of 11.0 being returned
>> for
>> > some users' recommendations.
>> >
>> > This is making it very hard for me to use the scoring and estimation
>> > functions.  I have switched back to Tanimoto for now, but am I doing
>> > something wrong, or am I incorrect in expecting the recommended scores
>> and
>> > estimated preferences to be in the 0-1.0 range for this dataset?
>>
>
>

Reply via email to