Re: MinHash/ItemBased

Sean Owen Tue, 25 Oct 2011 07:07:55 -0700

Can you put any more numbers around this? how slow is slow, how big is big?
What part of Mahout are you using -- or are you using Mahout?


Item-based recommendation sounds fine. Anonymous users aren't a
problem as long as you can distinguish them reasonably.
I think your challenge is to have a data model that quickly drops out
data from old items and can bring new items in.

Is this small enough to do in memory? that's the simple, easy place to start.

On Tue, Oct 25, 2011 at 2:59 PM, Vishal Santoshi
<[email protected]> wrote:
> Hello Folks,
>                  The Item Based Recommendations for my dataset is
> excruciatingly slow on a 8 node cluster. Yes the number of items is big and
> the dataset churn does not allow for a long asynchronous process.
> Recommendations cannot be stale ( a 30 minute delay is stale ). I have tried
> out MinHash clustering and that is scalable, but without a "degree of
> association" with multiple clusters any user may belong to , it seems less
> tight that pure item based ( and thus similarity probability ) algorithm.
>
> Any ideas how we pull this off., where
>
> * The item churn is frequent. New items enter the dataset all the time.
> * There is no "preference" apart from opt in.
> * Very frequent anonymous users enter the system almost all the time.
>
>
> Scale is very important.
>
> I am tending towards MinHash with additional algorithms that are executed
> offline and co occurance.
>

Re: MinHash/ItemBased

Reply via email to