On Mon, Jul 22, 2013 at 9:20 AM, Pat Ferrel <p...@occamsmachete.com> wrote:

> +10
>
> Love the academics but I agree with this. Recently saw a VP from Netflix
> plead with the audience (mostly academics) to move past RMSE--focus on
> maximizing correct ranking, not rating prediction.
>
> Anyway I have a pipeline that does *[ingest, prepare, row-similarity, not
> in m/r]*
>

Is this available?

replaces PreparePreferenceMatrixJob to create n matrixes depending on the
> number of actions you are splitting out. This job also creates external <->
> internal item and user id BiHashMaps for going back and forth between the
> log's IDs and Mahout internal IDs. It guarantees a uniform item and user ID
> space and sparse matrix ranks by creating one from all actions. Not
> completely scalable since it is not done in m/r though it uses HDFS--I have
> a plan to m/r the process and get rid of the hashmap.
>

Frankly, doing it outside of map-reduce is good for a start and should be
preserved for later.  It makes on-boarding new folks much easier.


> performs the RowSimilarityJob on the primary matrix "B" and does B'A to
> create a cooccurrence matrix for primary and secondary actions.
>

What code do you use for B'A?


> Stores all recs from all models in a NoSQL DB.
>

I recommend not doing this for the demo, but rather storing rows of B'A and
B'B as fields in Solr.


> At rec request time it does a linear combination of req and cross-rec to
> return the highest scored ones.


Should be integrated into the query.


> Does 1-3 fit the first part of 'offline to Solr'? The IDs can be written
> to Solr as the original external IDs from the log files, which were
> strings. This allows them to be treated as terms by Solr.
>

Yes.  These early steps are very much what I was aiming for.


> My understanding of the Solr proposal puts B's row similarity matrix in a
> vector per item.


For a particular item document, the corresponding row of B'A and the
corresponding row of B'B go into separate fields.  I think you mean B'B
when you say "B's row similarity matrix".  Just checking.



> That means each row is turned into "terms" = external IDs--not sure how
> the weights of each term are encoded.


Again, I just use native Solr weighting.


> So the cross-recommender would just put the cross-action similarity matrix
>  in other field(s) on the same itemID/docID, right?
>

Yes.  Exactly.


>
> Then the straight out recommender queries on the B'B field(s) and the
> cross-recommender queries on the B'A field(s). I suppose to keep it simple
> the cross-action similarity matrix could be put in a separate index.  Is
> this about right?
>

Yes.  And the combined recommender would query on both at the same time.

Reply via email to