Not following so… Here so is what I've done in probably too much detail:
1) ingest raw log files and split them up by action 2) turn these into Mahout preference files using Mahout type IDs, keeping a map of IDs 3) run the Mahout Item-based recommender using LLR for similarity 4) created a Mahout style cross-recommender using cooccurrence similarity using matrix math 5) given two similairty matrixes and a user history matrix I am writing them to csv files with Mahout ID replaced by the original string external IDs for users and items input log file before splitting: u1 purchase iphone u1 purchase ipad u2 purchase nexus-tablet u2 purchase galaxy u3 purchase surface u4 purchase iphone u4 purchase ipad u1 view iphone u1 view ipad u1 view nexus-tablet u1 view galaxy u2 view iphone u2 view ipad u2 view nexus-tablet u2 view galaxy u3 view surface u4 view iphone u4 view ipad u4 view nexus-tablet Input user history DRM after ID translation to mahout IDs and splitting for action "purchase" B user/item iphone ipad nexus-tablet galaxy surface u1 1 1 0 0 0 u2 0 0 1 1 0 u3 0 0 0 0 1 u4 1 1 0 0 0 Map of IDs Mahout to Original/External 0 -> iphone 1 -> ipad 2 -> nexus-tablet 3 -> galaxy 4 -> surface To be specific the DRM from the RecommenderJob with item-item similarities using LLR looks like this: Input Path: out/p-recs/sims/part-r-00000 Key class: class org.apache.hadoop.io.IntWritable Value Class: class org.apache.mahout.math.VectorWritable Key: 0: Value: {1:0.8472157541208549} Key: 1: Value: {0:0.8472157541208549} Key: 2: Value: {3:0.8181382096075936} Key: 3: Value: {2:0.8181382096075936} Key: 4: Value: {} This will be written to a directory for later Solr indexing as a csv of the form: item_id,similar_items,cross_action_similar_items iphone,ipad, ipad,iphone, nexus-tablet,galaxy, galaxy, nexus-tablet, surface,, By using a user's history vector as a query you get results = recommendations So if the user is u1, the history vector is: "iphone ipad" The Solr results for query "iphone ipad" using field "similar_items" will be 1. Doc ID, ipad 2. Doc ID, iphone If you want item similarities, for instance if a user is anonymous with no history and is looking at an iphone product page. You would fetch the doc for id = "iphone" and get: "ipad" Perhaps a bad example for ordering, since there is only one ID in the doc but the items in the "similar_items" field would be ordered by similarity strength. Likewise for the cross-action similarities though the matrix will have cooccurrence [B'A] values in the DRM. For item similarities there is no need to do more than fetch one doc that contains the similarities, right? I've successfully used this method with the Mahout recommender but please correct me if something above is wrong. On Jul 31, 2013, at 4:52 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: Pat, See inline On Wed, Jul 31, 2013 at 1:29 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > So the XML as CSV would be: > item_id,similar_items,cross_action_similar_items > ipad,iphone,iphone nexus > iphone,ipad,ipad galaxy > Right. Doesn't matter what format. Might want quotes around space delimited lists, but anything will do. > > Note: As I mentioned before the order of the items in the field will > encode rank of the similarity strength. This is for cases where you want to > find similar items to a context item. You would fetch the doc for the > context item by it's item ID and show the top k items in the doc. Ted's > caveat would probably be to dither them. > I always say "dither" so that is an easy one. But fetching similar items of a center item by fetching the center item and then fetching each of the referenced items is typically slower by about 2x than running the search for mentions of the center item. > Sounds like Ted is generating data. Andrew or M Lyon do either of you want > to set the demo system up? If so you'll need to find a system--free tier > AWS, Ted's box, etc. Then install all the needed stuff. > > I'll get the output working to csv. > > On Jul 31, 2013, at 11:51 AM, Pat Ferrel <pat.fer...@gmail.com> wrote: > > OK and yes. The docs will look like: > > <add> > <doc> > <field name='item_id'>ipad</field> > <field name='similar_items'>iphone</field> > <field name='cross_action_similar_items'>iphone nexus</field> > </doc> > <doc> > <field name='item_id'>iphone</field> > <field name='similar_items'>ipad</field> > <field name='cross_action_similar_items'>ipad galaxy</field> > </doc> > </add> > > > On Jul 31, 2013, at 11:42 AM, B Lyon <bradfl...@gmail.com> wrote: > > I'm interested in helping as well. > Btw I thought that what was stored in the solr fields were the llr-filtered > items (ids I guess) for the could-be-recommended things. > >