Sebastian, thanks. i tried the --booleanData option and now i do get some output.
could you let me know what the serious bug in v0.4 is? thanks. On Sat, Mar 12, 2011 at 4:12 AM, Sebastian Schelter <s...@apache.org> wrote: > Hello Jake, > > my first advice would be to use the RecommenderJob from the current trunk, > the 0.4 version has a serious bug unfortunately. > > Your toy data is too small to give output, let me explain why. > > The first thing that RecommenderJob will do is to compute all pairs of > similar items (all pairs of items that cooccured within the preferences of a > single user): > > 10,20 > 10,30 > 10,40 > 30,40 > > The next thing that happens is that RecommenderJob tries to predict how much > the users like items that might possibly be recommended to them. In order to > do this for a single user,item pair we need to look at all items similar to > the "candidate" item that have also been liked by the user. The formula used > is a weighted sum defined like this: > > u = a user > i = an item not yet rated by u > N = all items similar to i > > Prediction(u,i) = sum(all n from N: similarity(i,n) * rating(u,n)) / sum(all > n from N: abs(similarity(i,n))) > > This formula has one drawback. If we only know a single similar item, the > prediction will just be the "rating" value for that single similar item. In > order to avoid this, we throw out all predicitions that we're based on a > single item only. > > Unfortunately your toy data is so small that there is no prediction, that > can be based on more than one item, so everything is thrown away and the > output is empty. > > As you only have boolean data in your example (no ratings), you could use > --booleanData to make RecommenderJob treat the input as boolean (it does not > do this automatically). In that case you should see output as the previously > described problem doesn't exist for boolean data. > > --sebastian > > On 12.03.2011 06:03, Jake Vang wrote: >> >> hi, >> >> i am testing the RecommenderJob. according to the v0.4 javadocs, it >> requires the format: userID,itemID[,preferencevalue]. i have a very >> simple input i want to test before running it on the real dataset. my >> toy data set is as simple as the following lines. >> >> 1,10 >> 1,20 >> 2,10 >> 2,30 >> 2,40 >> 3,10 >> 3,20 >> >> (user 1 likes item 10, user 1 likes item 20, and so on). >> >> i then run the job. >> >> hadoop jar mahout-core-0.4-job.jar >> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob >> -Dmapred.input.dir=/input/toy/toydata.txt >> -Dmapred.output.dir=/output/toy01 >> >> however, when i look at the results in (part-r-0000), i see nothing. >> the file is blank. why is this happening? >> >> i am running this on cygwin. i can run the hadoop examples correctly. >> is there something that i am doing wrong? > >