Re: Which database should I use with Mahout

Ted Dunning Sun, 19 May 2013 22:05:05 -0700

On Sun, May 19, 2013 at 8:34 PM, Pat Ferrel <p...@occamsmachete.com> wrote:


> Won't argue with how fast Solr is, It's another fast and scalable lookup
> engine and another option. Especially if you don't need to lookup anything
> else by user, in which case you are back to a db...
>

But remember, it is also doing more than lookup.  It is computing scores on
items and retaining the highest scoring items.


> Using a cooccurrence matrix means you are doing item similairty since
> there is no user data in the matrix. Or are you talking about using the
> user history as the query? in which case you have to remember somewhere all
> users' history and look it up for the query, no?
>

Yes.  You do.  And that is the key to making this orders of magnitude
faster.

But that is generally fairly trivial to do.  One option is to keep it in a
cookie.  Another is to use browser persistent storage.  Another is to use a
memory based user profile database.  Yet another is to use M7 tables on
MapR or HBase on other Hadoop distributions.


> On May 19, 2013, at 8:09 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>
> On Sun, May 19, 2013 at 8:04 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
>
> > Two basic solutions to this are: factorize (reduces 100s of thousands of
> > items to hundreds of 'features') and continue to calculate recs at
> runtime,
> > which you have to do with Myrrix since mahout does not have an in-memory
> > ALS impl, or move to the mahout hadoop recommenders and pre-calculate
> recs.
> >
>
> Or sparsify the cooccurrence matrix and run recommendations out of a search
> engine.
>
> This will scale to thousands or tens of thousands of recommendations per
> second against 10's of millions of items.  The number of users doesn't
> matter.
>
>

Re: Which database should I use with Mahout

Reply via email to