
Yes.  The first part probably just is the RowSimilarity job, especially
after Sebastian puts in the down-sampling.

The new part is exactly as you say, storing the DRM into Solr indexes.

There is no reason to not use a real data set.  There is a strong reason to
use a synthetic dataset, however, in that it can be trivially scaled up and
down both in items and users.  Also, the synthetic dataset doesn't require
that the real data be found and downloaded.

On Sun, Jul 21, 2013 at 2:17 PM, Pat Ferrel <> wrote:

> Read the paper, and the preso.
> As to the 'offline to Solr' part. It sounds like you are suggesting an
> item item similarity matrix be stored and indexed in Solr. One would have
> to create the action matrix from user profile data (preference history), do
> a rowsimiarity job on it (using LLR similarity) and move the result to
> Solr. The first part of this is nearly identical to the current recommender
> job workflow and could pretty easily be created from it I think. The new
> part is taking the DistributedRowMatrix and storing it in a particular way
> in Solr, right?
> BTW Is there some reason not to use an existing real data set?
> On Jul 19, 2013, at 3:45 PM, Ted Dunning <> wrote:
> OK.  I think the crux here is the off-line to Solr part so let's see who
> else pops up.
> Having a solr maven could be very helpful.

Reply via email to