(Agree, and the sampling happens at the user level now -- so if you sample one of these users, it slows down a lot. The spirit of the proposed change is to make sampling more fine-grained, at the individual item level. That seems to certainly fix this.)
On Thu, Dec 1, 2011 at 10:46 PM, Ted Dunning <[email protected]> wrote: > This may or may not help much. My guess is that the improvement will be > very modest. > > The most serious problem is going to be recommendations for anybody who has > rated one of these excessively popular items. That item will bring in a > huge number of other users and thus a huge number of items to consider. If > you down-sample ratings of the prolific users and kill super-common items, > I think you will see much more improvement than simply eliminating the > singleton users. > > The basic issue is that cooccurrence based algorithms have run-time > proportional to O(n_max^2) where n_max is the maximum number of items per > user. > > On Thu, Dec 1, 2011 at 2:35 PM, Daniel Zohar <[email protected]> wrote: > > > This is why I'm looking now into improving GenericBooleanPrefDataModel to > > not take into account users which made one interaction under the > > 'preferenceForItems' Map. What do you think about this approach? > > >
