On Mon, Jul 22, 2013 at 9:20 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
> +10 > > Love the academics but I agree with this. Recently saw a VP from Netflix > plead with the audience (mostly academics) to move past RMSE--focus on > maximizing correct ranking, not rating prediction. > > Anyway I have a pipeline that does *[ingest, prepare, row-similarity, not > in m/r]* > Is this available? replaces PreparePreferenceMatrixJob to create n matrixes depending on the > number of actions you are splitting out. This job also creates external <-> > internal item and user id BiHashMaps for going back and forth between the > log's IDs and Mahout internal IDs. It guarantees a uniform item and user ID > space and sparse matrix ranks by creating one from all actions. Not > completely scalable since it is not done in m/r though it uses HDFS--I have > a plan to m/r the process and get rid of the hashmap. > Frankly, doing it outside of map-reduce is good for a start and should be preserved for later. It makes on-boarding new folks much easier. > performs the RowSimilarityJob on the primary matrix "B" and does B'A to > create a cooccurrence matrix for primary and secondary actions. > What code do you use for B'A? > Stores all recs from all models in a NoSQL DB. > I recommend not doing this for the demo, but rather storing rows of B'A and B'B as fields in Solr. > At rec request time it does a linear combination of req and cross-rec to > return the highest scored ones. Should be integrated into the query. > Does 1-3 fit the first part of 'offline to Solr'? The IDs can be written > to Solr as the original external IDs from the log files, which were > strings. This allows them to be treated as terms by Solr. > Yes. These early steps are very much what I was aiming for. > My understanding of the Solr proposal puts B's row similarity matrix in a > vector per item. For a particular item document, the corresponding row of B'A and the corresponding row of B'B go into separate fields. I think you mean B'B when you say "B's row similarity matrix". Just checking. > That means each row is turned into "terms" = external IDs--not sure how > the weights of each term are encoded. Again, I just use native Solr weighting. > So the cross-recommender would just put the cross-action similarity matrix > in other field(s) on the same itemID/docID, right? > Yes. Exactly. > > Then the straight out recommender queries on the B'B field(s) and the > cross-recommender queries on the B'A field(s). I suppose to keep it simple > the cross-action similarity matrix could be put in a separate index. Is > this about right? > Yes. And the combined recommender would query on both at the same time.