Sebastian,

thanks. i tried the --booleanData option and now i do get some output.

could you let me know what the serious bug in v0.4 is?

thanks.

On Sat, Mar 12, 2011 at 4:12 AM, Sebastian Schelter <s...@apache.org> wrote:
> Hello Jake,
>
> my first advice would be to use the RecommenderJob from the current trunk,
> the 0.4 version has a serious bug unfortunately.
>
> Your toy data is too small to give output, let me explain why.
>
> The first thing that RecommenderJob will do is to compute all pairs of
> similar items (all pairs of items that cooccured within the preferences of a
> single user):
>
> 10,20
> 10,30
> 10,40
> 30,40
>
> The next thing that happens is that RecommenderJob tries to predict how much
> the users like items that might possibly be recommended to them. In order to
> do this for a single user,item pair we need to look at all items similar to
> the "candidate" item that have also been liked by the user. The formula used
> is a weighted sum defined like this:
>
> u = a user
> i = an item not yet rated by u
> N = all items similar to i
>
> Prediction(u,i) = sum(all n from N: similarity(i,n) * rating(u,n)) / sum(all
> n from N: abs(similarity(i,n)))
>
> This formula has one drawback. If we only know a single similar item, the
> prediction will just be the "rating" value for that single similar item. In
> order to avoid this, we throw out all predicitions that we're based on a
> single item only.
>
> Unfortunately your toy data is so small that there is no prediction, that
> can be based on more than one item, so everything is thrown away and the
> output is empty.
>
> As you only have boolean data in your example (no ratings), you could use
> --booleanData to make RecommenderJob treat the input as boolean (it does not
> do this automatically). In that case you should see output as the previously
> described problem doesn't exist for boolean data.
>
> --sebastian
>
> On 12.03.2011 06:03, Jake Vang wrote:
>>
>> hi,
>>
>> i am testing the RecommenderJob. according to the v0.4 javadocs, it
>> requires the format: userID,itemID[,preferencevalue]. i have a very
>> simple input i want to test before running it on the real dataset. my
>> toy data set is as simple as the following lines.
>>
>> 1,10
>> 1,20
>> 2,10
>> 2,30
>> 2,40
>> 3,10
>> 3,20
>>
>> (user 1 likes item 10, user 1 likes item 20, and so on).
>>
>> i then run the job.
>>
>> hadoop jar mahout-core-0.4-job.jar
>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>> -Dmapred.input.dir=/input/toy/toydata.txt
>> -Dmapred.output.dir=/output/toy01
>>
>> however, when i look at the results in (part-r-0000), i see nothing.
>> the file is blank. why is this happening?
>>
>> i am running this on cygwin. i can run the hadoop examples correctly.
>> is there something that i am doing wrong?
>
>

Reply via email to