Sorry to be dense but I think there is some miscommunication. The most 
important question is: am I writing the item-item similarity matrix DRM out to 
Solr, one row = one Solr doc? For the mapreduce Mahout Item-based recommender 
this is in "tmp/similarityMatrix". If not then please stop me. If I'm off base 
here, maybe a skype or im session will straighten me out. pat.fer...@gmail.com 
or p...@occamsmachete.com


To be clear below I'm not talking about history based recs, which is the 
primary use case. I am talking about a query that does not use history, that 
only finds similar items based on training data. The item-item similarity 
matrix DRM contains Key = item ID, Value = list of item IDs with similarity 
strengths.

This is equivalent to the list returned by ItemBasedRecommender's
public List<RecommendedItem> mostSimilarItems(long itemID, int howMany) throws 
TasteException

Specified by:
mostSimilarItems in interface ItemBasedRecommender

Parameters:
itemID - ID of item for which to find most similar other items
howMany - desired number of most similar items to find

Returns:
items most similar to the given item, ordered from most similar to least

To get the list from Solr you would fetch the doc associated with "itemID", no? 

When using the Mahout mapreduce item-based recommender we get the similarity 
matrix and do just that. We get the row associated with the Mahout itemID and 
recommend the top k items from the vector. This performs well in 
cross-validation tests.



On Aug 1, 2013, at 9:49 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:

On Thu, Aug 1, 2013 at 8:46 AM, Pat Ferrel <p...@occamsmachete.com> wrote:

> 
> For item similarities there is no need to do more than fetch one doc that
> contains the similarities, right? I've successfully used this method with
> the Mahout recommender but please correct me if something above is wrong.


No.

First, you need to retrieve all the other documents that are referenced to
get their display meta-data. So this isn't just a one document fetch.

Second, the similar items point inwards, not outwards.  Thus, the query you
want has the id of the current item and searches the similar_items field.
The result of that search is all of the similar items.

The confusion here may stem from the name of the field.  A name like
"linked-from-items" or some such might help here.


Another way to look at this is that there should be no procedural
difference if you have 10 items or 20 in your history.  Either way, your
history is a query against the appropriate link fields.  Likewise, there
should be no difference between having 10 items or 2 items in your history.
There shouldn't even be any difference if you have even just 1 item in
your history.

Finding items similar to a single item is exactly like having 1 item in
your history.  So that should be done by searching with that one item in
the appropriate link fields.

Reply via email to