Combining the latest commits with my optimized-SamplingCandidateItemsStrategy (http://pastebin.com/6n9C8Pw1) I achieved satisfying results. All the queries were under one second.
Sebastian, I took a look at your patch and I think it's more practical than the current SamplingCandidateItemsStrategy, however it still doesn't put a strict cap on the number of possible item IDs like my implementation does. Perhaps there is room for both implementations? On Sun, Dec 4, 2011 at 11:13 AM, Sebastian Schelter <s...@apache.org> wrote: > I created a jira to supply a non-distributed counterpart of the > sampling that is done in the distributed item similarity computation: > > https://issues.apache.org/jira/browse/MAHOUT-914 > > > 2011/12/2 Sean Owen <sro...@gmail.com>: > > For your purposes, it's LogLikelihoodSimilarity. I made similar changes > in > > other files. Ideally, just svn update to get all recent changes. > > > > On Fri, Dec 2, 2011 at 6:43 PM, Daniel Zohar <disso...@gmail.com> wrote: > > > >> Sean, can you tell me which files have you committed the changes to? > Thanks >