Sorry to be dense but I think there is some miscommunication. The most important question is: am I writing the item-item similarity matrix DRM out to Solr, one row = one Solr doc? For the mapreduce Mahout Item-based recommender this is in "tmp/similarityMatrix". If not then please stop me. If I'm off base here, maybe a skype or im session will straighten me out. pat.fer...@gmail.com or p...@occamsmachete.com
To be clear below I'm not talking about history based recs, which is the primary use case. I am talking about a query that does not use history, that only finds similar items based on training data. The item-item similarity matrix DRM contains Key = item ID, Value = list of item IDs with similarity strengths. This is equivalent to the list returned by ItemBasedRecommender's public List<RecommendedItem> mostSimilarItems(long itemID, int howMany) throws TasteException Specified by: mostSimilarItems in interface ItemBasedRecommender Parameters: itemID - ID of item for which to find most similar other items howMany - desired number of most similar items to find Returns: items most similar to the given item, ordered from most similar to least To get the list from Solr you would fetch the doc associated with "itemID", no? When using the Mahout mapreduce item-based recommender we get the similarity matrix and do just that. We get the row associated with the Mahout itemID and recommend the top k items from the vector. This performs well in cross-validation tests. On Aug 1, 2013, at 9:49 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: On Thu, Aug 1, 2013 at 8:46 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > > For item similarities there is no need to do more than fetch one doc that > contains the similarities, right? I've successfully used this method with > the Mahout recommender but please correct me if something above is wrong. No. First, you need to retrieve all the other documents that are referenced to get their display meta-data. So this isn't just a one document fetch. Second, the similar items point inwards, not outwards. Thus, the query you want has the id of the current item and searches the similar_items field. The result of that search is all of the similar items. The confusion here may stem from the name of the field. A name like "linked-from-items" or some such might help here. Another way to look at this is that there should be no procedural difference if you have 10 items or 20 in your history. Either way, your history is a query against the appropriate link fields. Likewise, there should be no difference between having 10 items or 2 items in your history. There shouldn't even be any difference if you have even just 1 item in your history. Finding items similar to a single item is exactly like having 1 item in your history. So that should be done by searching with that one item in the appropriate link fields.