Hi Jake,
https://issues.apache.org/jira/browse/MAHOUT-610 has the details about
the bug.
--sebastian
On 14.03.2011 16:24, Jake Vang wrote:
Sebastian,
thanks. i tried the --booleanData option and now i do get some output.
could you let me know what the serious bug in v0.4 is?
thanks.
On Sat, Mar 12, 2011 at 4:12 AM, Sebastian Schelter<s...@apache.org> wrote:
Hello Jake,
my first advice would be to use the RecommenderJob from the current trunk,
the 0.4 version has a serious bug unfortunately.
Your toy data is too small to give output, let me explain why.
The first thing that RecommenderJob will do is to compute all pairs of
similar items (all pairs of items that cooccured within the preferences of a
single user):
10,20
10,30
10,40
30,40
The next thing that happens is that RecommenderJob tries to predict how much
the users like items that might possibly be recommended to them. In order to
do this for a single user,item pair we need to look at all items similar to
the "candidate" item that have also been liked by the user. The formula used
is a weighted sum defined like this:
u = a user
i = an item not yet rated by u
N = all items similar to i
Prediction(u,i) = sum(all n from N: similarity(i,n) * rating(u,n)) / sum(all
n from N: abs(similarity(i,n)))
This formula has one drawback. If we only know a single similar item, the
prediction will just be the "rating" value for that single similar item. In
order to avoid this, we throw out all predicitions that we're based on a
single item only.
Unfortunately your toy data is so small that there is no prediction, that
can be based on more than one item, so everything is thrown away and the
output is empty.
As you only have boolean data in your example (no ratings), you could use
--booleanData to make RecommenderJob treat the input as boolean (it does not
do this automatically). In that case you should see output as the previously
described problem doesn't exist for boolean data.
--sebastian
On 12.03.2011 06:03, Jake Vang wrote:
hi,
i am testing the RecommenderJob. according to the v0.4 javadocs, it
requires the format: userID,itemID[,preferencevalue]. i have a very
simple input i want to test before running it on the real dataset. my
toy data set is as simple as the following lines.
1,10
1,20
2,10
2,30
2,40
3,10
3,20
(user 1 likes item 10, user 1 likes item 20, and so on).
i then run the job.
hadoop jar mahout-core-0.4-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=/input/toy/toydata.txt
-Dmapred.output.dir=/output/toy01
however, when i look at the results in (part-r-0000), i see nothing.
the file is blank. why is this happening?
i am running this on cygwin. i can run the hadoop examples correctly.
is there something that i am doing wrong?