Hello Jake,
my first advice would be to use the RecommenderJob from the current
trunk, the 0.4 version has a serious bug unfortunately.
Your toy data is too small to give output, let me explain why.
The first thing that RecommenderJob will do is to compute all pairs of
similar items (all pairs of items that cooccured within the preferences
of a single user):
10,20
10,30
10,40
30,40
The next thing that happens is that RecommenderJob tries to predict how
much the users like items that might possibly be recommended to them. In
order to do this for a single user,item pair we need to look at all
items similar to the "candidate" item that have also been liked by the
user. The formula used is a weighted sum defined like this:
u = a user
i = an item not yet rated by u
N = all items similar to i
Prediction(u,i) = sum(all n from N: similarity(i,n) * rating(u,n)) /
sum(all n from N: abs(similarity(i,n)))
This formula has one drawback. If we only know a single similar item,
the prediction will just be the "rating" value for that single similar
item. In order to avoid this, we throw out all predicitions that we're
based on a single item only.
Unfortunately your toy data is so small that there is no prediction,
that can be based on more than one item, so everything is thrown away
and the output is empty.
As you only have boolean data in your example (no ratings), you could
use --booleanData to make RecommenderJob treat the input as boolean (it
does not do this automatically). In that case you should see output as
the previously described problem doesn't exist for boolean data.
--sebastian
On 12.03.2011 06:03, Jake Vang wrote:
hi,
i am testing the RecommenderJob. according to the v0.4 javadocs, it
requires the format: userID,itemID[,preferencevalue]. i have a very
simple input i want to test before running it on the real dataset. my
toy data set is as simple as the following lines.
1,10
1,20
2,10
2,30
2,40
3,10
3,20
(user 1 likes item 10, user 1 likes item 20, and so on).
i then run the job.
hadoop jar mahout-core-0.4-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=/input/toy/toydata.txt
-Dmapred.output.dir=/output/toy01
however, when i look at the results in (part-r-0000), i see nothing.
the file is blank. why is this happening?
i am running this on cygwin. i can run the hadoop examples correctly.
is there something that i am doing wrong?