Ideally, you would file a bug and see whether it still happens with trunk. I think the problems comes from the fact, that we only use a certain number of preferences from the user for the final recommendation phase. Therefore we can hit an item as recommendation whose preference we neglected.
Best, Sebastian 2013/7/31 Rafal Lukawiecki <ra...@projectbotticelli.com> > Dear Sebastian, > > It looks like setting --maxPrefsPerUser 10000 have resolved the issue in > our case—it seems that the most preferences a user had was just about 5000, > so I doubled it just-in-case, but when I operationalise this model, I will > make sure to calculate the actual max number of preferences and set the > parameter accordingly. I will double-check the resultset to make sure the > issue is really gone, as I have only checked the few cases where we have > spotted a recommendation of a previously preferred item. > > Would you like me to file a bug, and would you like me to test it on 0.8 > or another version? I am using 0.7. > > Thanks for your kind support. > Rafal > -- > Rafal Lukawiecki > Strategic Consultant and Director > Project Botticelli Ltd > > On 31 Jul 2013, at 06:22, Sebastian Schelter <ssc.o...@googlemail.com> > wrote: > > Hi Rafal, > > can you try to set the option --maxPrefsPerUser to the maximum number of > interactions per user and see if you still get the error? > > Best, > Sebastian > > On 30.07.2013 19:29, Rafal Lukawiecki wrote: > > Thank you Sebastian. The data set is not that large, as we are running > tests on a subset. It is about 24k users, 40k items, the preference file > has 65k preferences as triples. This was using Similarity Cooccurrence. > > > > I can see if I could anonymise the data set to share if that would be > helpful. > > > > Thanks for your kind help. > > > > Rafal > > -- > > Rafal Lukawiecki > > Pardon my brevity, sent from a telephone. > > > > On 30 Jul 2013, at 18:18, "Sebastian Schelter" <s...@apache.org> wrote: > > > >> Hi Rafal, > >> > >> can you issue a ticket for this problem at > >> https://issues.apache.org/jira/browse/MAHOUT ? We have unit-tests that > >> check whether this happens and currently they work fine. I can only > imagine > >> that the problem occurs in larger datasets where we sample the data in > some > >> places. Can you describe a scenario/dataset where this happens? > >> > >> Best, > >> Sebastian > >> > >> 2013/7/30 Rafal Lukawiecki <ra...@projectbotticelli.com> > >> > >>> I'm new here, just registered. Many thanks to everyone for working on > an > >>> amazing piece of software, thank you for building Mahout and for your > >>> support. My apologies if this is not the right place to ask the > question—I > >>> have searched for the issue, and I can see this problem has been > reported > >>> here: > >>> > http://stackoverflow.com/questions/13822455/apache-mahout-distributed-recommender-recommends-already-rated-items > >>> > >>> Unfortunately, the trail leads to the newsgroups, and I have not found > a > >>> way, yet, to get an answer from them, without asking you. > >>> > >>> Essentially, I am running a Hadoop RecommenderJob from Mahout 0.7, and > I > >>> am finding that it is recommending items that the user has already > >>> expressed a preference for in their input file. I understand that this > >>> should not be happening, and I am not sure if there is a know fix or > if I > >>> should be looking for a workaround (such as using the entire input as > the > >>> filterFile). > >>> > >>> I will double-check that there is no error on my side, but so far it > does > >>> not seem that way. > >>> > >>> Many thanks and my regards from Ireland, > >>> Rafal Lukawiecki > >>> > >>> -- > >>> > >>> Rafal Lukawiecki > >>> > >>> Strategic Consultant and Director > >>> > >>> Project Botticelli Ltd > >>> > >>> > > > >