Solr uses cosine similarity for it's queries. The implementation on github uses 
Mahout LLR for calculating the item-item similarity matrix but when you do the 
more-like-this query at runtime Solr uses cosine. This can be fixed in Solr, 
not sure how much work.

It sounds like you are doing item-item similarities for recommendations, not 
actually calculating user-history based recs, is that true? 

You bring up a point that we're finding. I'm not so sure we need or want a 
recommender query API that is separate from the Solr query API. What we are 
doing on our demo site is putting the output of the Solr-recommender where Solr 
can index it. Our web app framework then allows very flexible queries against 
Solr, using simple user history, producing the typical user-history based 
recommendations, or mixing/boosting based on metadata or contextual data. If we 
leave the recommender query API in Solr we get web app framework integration 
for free.

Another point is where the data is stored for the running system. If we allow 
Solr to index from any storage service that it supports then we also get free 
integration with most any web app framework and storage service. For the demo 
site we put the data in a DB and have Solr index it from there. We also store 
the user history and metadata there. This is supported by most web app 
frameworks out of the box. You could go a different route and use almost any 
storage system/file system/content format since Solr supports a wide variety.

Given a fully flexible Solr standard query and indexing scheme all you need do 
is tweak the query or data source a bit and you have an item-set recommender 
(shopping cart) or a contextual recommender (for example boost recs from a 
category) or a pure metadata/content based recommender.  

If the query and storage is left to Solr+web app framework then the github 
version is complete if not done. Solr still needs LLR in the more-like-this 
queries. Term weights to encode strength scores would also be nice and I agree 
that both of these could use some work.

BTW lest we forget this does not imply the Solr-recommender is better than 
Myrrix or the Mahout-only recommenders. There needs to be some careful 
comparison of results. Michael, did you do offline or A/B tests during your 

Just to add a note of encouragement for the idea of better integration between 
Mahout and Solr:

On, we've recently converted our recommender, which computes 
similarity scores w/Mahout, from storing scores and running queries w/Postgres, 
to doing all that in Solr.  It's been a big improvement, both in terms of 
indexing speed, and more importantly, the flexibility of the queries we can 
write.  I believe that having scoring built in to the query engine is a key 
feature for recommendations.  More and more I am coming to believe that 
recommendation should just be considered as another facet of search: as one 
among many variables the system may take into account when presenting relevant 
information to the user.  In our system, we still clearly separate search from 
recommendations, and we probably will always do that to some extent, but I 
think we will start to blend the queries more so that there will be essentially 
a continuum of query options including more or less "user preference" data.

I think what I'm talking about may be a bit different than what Pat is 
describing (in implementation terms), since we do LLR calculations off-line in 
Mahout and then bulk load them into Solr.  We took one of Ted's earlier 
suggestions to heart, and simply ignored the actual numeric scores: we index 
the top N similar items for each item.  Later we may incorporate numeric scores 
in Solr as term weights.  If people are looking for things to do :) I think 
that would be a great software contribution that could spur this effort onward 
since it's difficult to accomplish right now given the Solr/Lucene indexing 
interfaces, but is already supported by the underlying data model and query 


> Excellent. From Ellen's description the first Music use may be an implicit 
> preference based recommender using synthetic  data? I'm quickly discovering 
> how flexible Solr use is in many of these cases.
> Here's another use you may have thought of:
> Shopping cart recommenders, as goes the intuition, are best modeled as 
> recommending from similar item-sets. If you store all shopping carts as your 
> training data (play lists, watch lists etc.) then as a user adds things to 
> their cart you query for the most similar past carts. Combine the results 
> intelligently and you'll have an item set recommender. Solr is built to do 
> this item-set similarity. We tried to do this for a ecom site with pure 
> Mahout but the similarity calc in real time stymied us. We knew we'd need 
> Solr but couldn't devote the resources to spin it up.
> On the Con-side Solr has a lot of stuff you have to work around. It also does 
> not have the ideal similarity measure for many uses (cosine is ok but llr 
> would probably be better). You don't want stop word filtering, stemming, 
> white space based tokenizing or n-grams. You would like explicit weighting. A 
> good thing about Solr is how well it integrates with virtually any doc store 
> independent of the indexing and query. A bit of an oval peg for a round hole.
> It looks like the similarity code is replaceable if not pluggable. Much of 
> the rest could be trimmed away by config or adherence to conventions I 
> suspect. In the demo site I'm working on I've had to adopt some slightly 
> hacky conventions that I'll describe some day.
> Tunneling (rat-holing?) into the cross-recommender and Solr+Mahout version.
> Things to note:
> 1) The pure Mahout XRecommenderJob needs a cross-LLR or a cross-similairty 
> job. Currently there is only cooccurrence for sparsification, which is far 
> from optimal. This might take the form of a cross RSJ with two DRMs as input. 
> I can't commit to this but would commit to adding it to the XRecommenderJob.
> 2) output to Solr needs a lot of options implemented and tested. The hand-run 
> test should be made into some junits. I'm slowly doing this.
> 3) the Solr query API is unimplemented unless someone else is working on 
> that. I'm building one in a demo site but it looks to me like a static 
> recommender API is not going to be all that useful and maybe a document 
> describing how to do it with the Solr query interface would be best, 
> especially for a first step. The reasoning here is that it is so tempting to 
> mix in metadata to the recommendation query that a static API is not so 
> obvious. For the demo site the recommender API will be prototyped in a bunch 
> of ways using models and controllers in Rails. If I'm the one to do the a 
> Java Solr-recommender query API it will be after experimenting a bit.
> Can someone introduce me to Ellen and Tim?
> The one large-ish feature that I think would find general use would be a high 
> performance classifier trainer.
> Flor cleanup sort of thing it would be good to fully integrate the streaming 
> k-means into the normal clustering commands while revamping the command line 
> API.
> Dmitriy's recent scala work would help quite a bit before 1.0. Not sure it 
> can make 0.9.
> For recommendations, I think that the demo system that pat started with the 
> elaborations by Ellen an Tim would be very good to have.
> I would be happy to collaborate with somebody on these but am not at all 
> likely to have time to actually do them end to end.
>> Moving closer to 1.0, removing cruft, etc.  Do we have any more major 
>> features planned for 1.0?  I think we said during 0.8 that we would try to 
>> follow pretty quickly w/ another release.
>> -Grant
>>> Sounds right in principle but perhaps a bit soon.
>>> What would define the release?
>>> Sent from my iPhone
>>>> Anyone interested in thinking about 0.9 in the early Nov. time frame?
>>>> -Grant
