I agree that randomness is good to introduce, but I disagree that it is only for evaluation purposes. In particular, I find it useful to smooth out some discontinuities that result from the incestuous relationship recommendation results and later recommendations. Without this, you can quickly wind up with a very strong echo in your recommendations of the first page of your previous recommendations. By dithering result orders, you can adjust the exploration / exploitation trade-off in your system. In anecdotal reports, I have heard and seen very dramatic improvements in recommendation quality with dithering.
That aside, a useful way to do the reordering is to generate a synthetic score for each item based on its rank r in the original sorted result set. double newScore = Math.exp(- r / scale) + fuzz * random.nextDouble(); the size of fuzz determines where the reordering will occur. If you pick fuzz = exp(-2), then reordering will only occur when exp(-(r+1)/scale) - exp(-r/scale) < 2*fuzz. For instance, with scale = 1, 3 and 10 and fuzz = 0.1, here are some sample re-orderings: > order(-exp(-(1:30)/3) + 0.1 * runif(30)) [1] 1 2 3 4 5 6 8 7 9 13 12 26 14 28 27 24 17 10 11 20 29 19 16 15 23 22 25 30 21 18 > order(-exp(-(1:30)/3) + 0.1 * runif(30)) [1] 1 2 3 4 6 5 11 7 8 21 20 29 16 24 25 14 19 26 9 15 13 28 10 22 12 17 27 18 30 23 > order(-exp(-(1:30)/3) + 0.1 * runif(30)) [1] 1 2 3 4 6 5 7 8 12 16 26 18 24 21 9 22 28 10 11 19 29 30 15 20 14 23 17 25 13 27 > order(-exp(-(1:30)/3) + 0.1 * runif(30)) [1] 1 2 3 4 5 6 7 8 10 11 28 20 26 18 29 19 22 24 25 9 15 21 27 30 13 16 14 12 17 23 > order(-exp(-(1:30)) + 0.1 * runif(30)) [1] 1 2 17 23 19 11 8 21 3 28 12 22 25 13 30 9 27 24 7 4 14 29 10 5 6 20 26 15 16 18 > order(-exp(-(1:30)) + 0.1 * runif(30)) [1] 1 2 15 25 20 29 14 5 7 16 10 3 18 13 17 23 8 26 22 12 21 24 4 30 9 27 28 19 11 6 > order(-exp(-(1:30)) + 0.1 * runif(30)) [1] 1 2 19 3 5 28 23 24 12 25 7 22 17 21 4 6 20 13 16 29 9 14 30 10 27 8 18 15 26 11 > order(-exp(-(1:30)/10) + 0.1 * runif(30)) [1] 1 2 3 5 4 7 6 8 9 10 11 12 13 14 16 18 15 22 19 17 23 24 25 20 21 29 30 27 28 26 > order(-exp(-(1:30)/10) + 0.1 * runif(30)) [1] 1 2 3 4 5 6 7 8 9 10 12 11 13 16 14 18 15 20 17 21 19 26 29 25 30 22 24 23 27 28 > order(-exp(-(1:30)/10) + 0.1 * runif(30)) [1] 1 2 3 5 4 7 6 8 9 12 11 10 13 16 14 15 20 19 17 22 25 18 21 28 24 23 29 26 27 30 As you can see, with scale = 1, only the first 2 results are stable and very deep results can be surfaced. With scale = 10, reordering only becomes very strong below 10-20 results. I usually use scale = 3 because result pages are usually 20 long. With shorter result pages, scale = 2 is probably justified. On Sat, Jul 2, 2011 at 12:56 AM, Sean Owen <sro...@gmail.com> wrote: > Yes, it's a good idea. Usually it serves a purpose for evaluation > only. You know the relative strength of recommendations, and know how > much ranking them 1st, 2nd, 3rd, etc biases the user to click on them. > So you can predict how many clicks each should relatively get. And you > can easily pull up recommendation #10 if you want to see if it gets > unusually more clicks than you'd expect. This tells you there's > something suboptimal about recs if so. > > Don't make similarity change randomly. It is supposed to have > properties like symmetry and transitivity, which would then break. > > You can make the neighborhood pick other users, yes. But I think the > most direct way to do this is to reorder the final recommendations. > That works for any implementation. So I would do #2. > > But again I would not do this just for its own sake; on its face, it > hurts recommendation quality. Do so if you are using it to evaluate > quality. > > > On Fri, Jul 1, 2011 at 7:42 PM, Salil Apte <sa...@offlinelabs.com> wrote: > > My first post to the Mahout group. First, Mahout devs, you have created > > something great so thanks! > > > > I have inherited some Mahout code and am trying to make some > improvements. I > > was hoping to get some guidance. > > > > 1. We are using the NearestNUserNeighborhood class for neighborhood > > calculations. While I want to use the similarity metrics provided in > Mahout, > > I also want to introduce some randomness. In effect, I want to include a > few > > people into the final nearest neighbors set that are not actually that > > close. That way, my recommender will include some outliers into the > results > > which is a desirable property for our recommender. What's the best way of > > doing this? I can of course implement my own similarity metric (which > could > > internally use PearsonCorrelationSimilarity) and then randomly give a > high > > correlation number to certain people. But is there a better way? > > > > 2. I also want to introduce some randomness into the final recommended > set. > > I am thinking I can do this by creating a custom IDRescorer and randomly > > bumping up the score for some of the items. This will of course require > some > > tweaking (how often an item gets a bump, how much of a bump does it get, > > etc.) but is there a better way of going about this? > > > > Thanks for the help! > > > > -Salil > > >