Hello Jake,

my first advice would be to use the RecommenderJob from the current trunk, the 0.4 version has a serious bug unfortunately.

Your toy data is too small to give output, let me explain why.

The first thing that RecommenderJob will do is to compute all pairs of similar items (all pairs of items that cooccured within the preferences of a single user):

10,20
10,30
10,40
30,40

The next thing that happens is that RecommenderJob tries to predict how much the users like items that might possibly be recommended to them. In order to do this for a single user,item pair we need to look at all items similar to the "candidate" item that have also been liked by the user. The formula used is a weighted sum defined like this:

u = a user
i = an item not yet rated by u
N = all items similar to i

Prediction(u,i) = sum(all n from N: similarity(i,n) * rating(u,n)) / sum(all n from N: abs(similarity(i,n)))

This formula has one drawback. If we only know a single similar item, the prediction will just be the "rating" value for that single similar item. In order to avoid this, we throw out all predicitions that we're based on a single item only.

Unfortunately your toy data is so small that there is no prediction, that can be based on more than one item, so everything is thrown away and the output is empty.

As you only have boolean data in your example (no ratings), you could use --booleanData to make RecommenderJob treat the input as boolean (it does not do this automatically). In that case you should see output as the previously described problem doesn't exist for boolean data.

--sebastian

On 12.03.2011 06:03, Jake Vang wrote:
hi,

i am testing the RecommenderJob. according to the v0.4 javadocs, it
requires the format: userID,itemID[,preferencevalue]. i have a very
simple input i want to test before running it on the real dataset. my
toy data set is as simple as the following lines.

1,10
1,20
2,10
2,30
2,40
3,10
3,20

(user 1 likes item 10, user 1 likes item 20, and so on).

i then run the job.

hadoop jar mahout-core-0.4-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=/input/toy/toydata.txt
-Dmapred.output.dir=/output/toy01

however, when i look at the results in (part-r-0000), i see nothing.
the file is blank. why is this happening?

i am running this on cygwin. i can run the hadoop examples correctly.
is there something that i am doing wrong?

Reply via email to