MinHash/ItemBased

Vishal Santoshi Tue, 25 Oct 2011 07:00:04 -0700

Hello Folks,
                  The Item Based Recommendations for my dataset is
excruciatingly slow on a 8 node cluster. Yes the number of items is big and
the dataset churn does not allow for a long asynchronous process.
Recommendations cannot be stale ( a 30 minute delay is stale ). I have tried
out MinHash clustering and that is scalable, but without a "degree of
association" with multiple clusters any user may belong to , it seems less
tight that pure item based ( and thus similarity probability ) algorithm.


Any ideas how we pull this off., where

* The item churn is frequent. New items enter the dataset all the time.
* There is no "preference" apart from opt in.
* Very frequent anonymous users enter the system almost all the time.


Scale is very important.

I am tending towards MinHash with additional algorithms that are executed
offline and co occurance.

MinHash/ItemBased

Reply via email to