On Thu, Aug 1, 2013 at 11:58 AM, Pat Ferrel <pat.fer...@gmail.com> wrote:

> Sorry to be dense but I think there is some miscommunication. The most
> important question is: am I writing the item-item similarity matrix DRM out
> to Solr, one row = one Solr doc?


Each row = one *field* in a Solr doc.  Different DRM's produce different
fields in the same docs.

There will also be item meta-data in the field.


> For the mapreduce Mahout Item-based recommender this is in
> "tmp/similarityMatrix". If not then please stop me. If I'm off base here,
> maybe a skype or im session will straighten me out. pat.ferrel@gmail.comor
> p...@occamsmachete.com


Actually, that is a grand idea.  Let's do a hangout.

>From the 
>who-is-free-when<https://docs.google.com/forms/d/1skIaqe0CBWO4qemTyHCZwS40YjXJ9FeLCqwV8cw4Gno/viewform>survey,
it looks like lots of people are available tomorrow at 2PM PDT.

Would that work?

To be clear below I'm not talking about history based recs, which is the
> primary use case. I am talking about a query that does not use history,
> that only finds similar items based on training data. The item-item
> similarity matrix DRM contains Key = item ID, Value = list of item IDs with
> similarity strengths.
>

Yes.  I absolutely agree that you can do this.

These should, strictly speaking, be columns in the item-item matrix.  The
item-item matrix may or may not be symmetric.  If it is symmetric, then
column or row doesn't matter.


> This is equivalent to the list returned by ItemBasedRecommender's
> public List<RecommendedItem> mostSimilarItems(long itemID, int howMany)
> throws TasteException
>

Yes.


> Specified by:
> mostSimilarItems in interface ItemBasedRecommender
>
> Parameters:
> itemID - ID of item for which to find most similar other items
> howMany - desired number of most similar items to find
>
> Returns:
> items most similar to the given item, ordered from most similar to least
>
> To get the list from Solr you would fetch the doc associated with
> "itemID", no?
>

If you store the column, then yes.

If you store the row, then using a query on the field containing the
similar items is the right answer.

The key difference that I have is what happens in the next step.

When using the Mahout mapreduce item-based recommender we get the
> similarity matrix and do just that. We get the row associated with the
> Mahout itemID and recommend the top k items from the vector. This performs
> well in cross-validation tests.
>

Good.

I think that there is a row/column confusion here, but they are probably
nearly identical in your application.

The key point is what happens *after* you do the query that you are
suggesting.

In your case, you have to retrieve the meta-data associated with each of
related items.  I like to store this meta-data in a Solr field (or three)
so this involves at least one additional query.  You can automatically
chain this second query by using the "join" operation that Solr provides,
but the second query still happens.

If you do the query the way that I suggest, this second query doesn't need
to happen.  You get the meta-data directly.





>
>
>
> On Aug 1, 2013, at 9:49 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>
> On Thu, Aug 1, 2013 at 8:46 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
>
> >
> > For item similarities there is no need to do more than fetch one doc that
> > contains the similarities, right? I've successfully used this method with
> > the Mahout recommender but please correct me if something above is wrong.
>
>
> No.
>
> First, you need to retrieve all the other documents that are referenced to
> get their display meta-data. So this isn't just a one document fetch.
>
> Second, the similar items point inwards, not outwards.  Thus, the query you
> want has the id of the current item and searches the similar_items field.
> The result of that search is all of the similar items.
>
> The confusion here may stem from the name of the field.  A name like
> "linked-from-items" or some such might help here.
>
>
> Another way to look at this is that there should be no procedural
> difference if you have 10 items or 20 in your history.  Either way, your
> history is a query against the appropriate link fields.  Likewise, there
> should be no difference between having 10 items or 2 items in your history.
> There shouldn't even be any difference if you have even just 1 item in
> your history.
>
> Finding items similar to a single item is exactly like having 1 item in
> your history.  So that should be done by searching with that one item in
> the appropriate link fields.
>
>

Reply via email to