Just to add a note of encouragement for the idea of better integration
between Mahout and Solr:
On safariflow.com, we've recently converted our recommender, which
computes similarity scores w/Mahout, from storing scores and running
queries w/Postgres, to doing all that in Solr. It's been a big
improvement, both in terms of indexing speed, and more importantly, the
flexibility of the queries we can write. I believe that having scoring
built in to the query engine is a key feature for recommendations. More
and more I am coming to believe that recommendation should just be
considered as another facet of search: as one among many variables the
system may take into account when presenting relevant information to the
user. In our system, we still clearly separate search from
recommendations, and we probably will always do that to some extent, but
I think we will start to blend the queries more so that there will be
essentially a continuum of query options including more or less "user
preference" data.
I think what I'm talking about may be a bit different than what Pat is
describing (in implementation terms), since we do LLR calculations
off-line in Mahout and then bulk load them into Solr. We took one of
Ted's earlier suggestions to heart, and simply ignored the actual
numeric scores: we index the top N similar items for each item. Later
we may incorporate numeric scores in Solr as term weights. If people
are looking for things to do :) I think that would be a great software
contribution that could spur this effort onward since it's difficult to
accomplish right now given the Solr/Lucene indexing interfaces, but is
already supported by the underlying data model and query engine.
-Mike
On 10/2/13 12:19 PM, Pat Ferrel wrote:
Excellent. From Ellen's description the first Music use may be an implicit
preference based recommender using synthetic data? I'm quickly discovering how
flexible Solr use is in many of these cases.
Here's another use you may have thought of:
Shopping cart recommenders, as goes the intuition, are best modeled as
recommending from similar item-sets. If you store all shopping carts as your
training data (play lists, watch lists etc.) then as a user adds things to
their cart you query for the most similar past carts. Combine the results
intelligently and you'll have an item set recommender. Solr is built to do this
item-set similarity. We tried to do this for a ecom site with pure Mahout but
the similarity calc in real time stymied us. We knew we'd need Solr but
couldn't devote the resources to spin it up.
On the Con-side Solr has a lot of stuff you have to work around. It also does
not have the ideal similarity measure for many uses (cosine is ok but llr would
probably be better). You don't want stop word filtering, stemming, white space
based tokenizing or n-grams. You would like explicit weighting. A good thing
about Solr is how well it integrates with virtually any doc store independent
of the indexing and query. A bit of an oval peg for a round hole.
It looks like the similarity code is replaceable if not pluggable. Much of the
rest could be trimmed away by config or adherence to conventions I suspect. In
the demo site I'm working on I've had to adopt some slightly hacky conventions
that I'll describe some day.
On Oct 1, 2013, at 10:38 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
Pat,
Ellen and some folks in Britain have been working with some data I produced
from synthetic music fans.
On Tue, Oct 1, 2013 at 2:22 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
Hi Ellen,
On Oct 1, 2013, at 12:38 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
As requested,
Pat, meet Ellen.
Ellen, meet Pat.
On Tue, Oct 1, 2013 at 8:46 AM, Pat Ferrel <pat.fer...@gmail.com> wrote:
Tunneling (rat-holing?) into the cross-recommender and Solr+Mahout version.
Things to note:
1) The pure Mahout XRecommenderJob needs a cross-LLR or a cross-similairty job.
Currently there is only cooccurrence for sparsification, which is far from
optimal. This might take the form of a cross RSJ with two DRMs as input. I
can't commit to this but would commit to adding it to the XRecommenderJob.
2) output to Solr needs a lot of options implemented and tested. The hand-run
test should be made into some junits. I'm slowly doing this.
3) the Solr query API is unimplemented unless someone else is working on that.
I'm building one in a demo site but it looks to me like a static recommender
API is not going to be all that useful and maybe a document describing how to
do it with the Solr query interface would be best, especially for a first step.
The reasoning here is that it is so tempting to mix in metadata to the
recommendation query that a static API is not so obvious. For the demo site the
recommender API will be prototyped in a bunch of ways using models and
controllers in Rails. If I'm the one to do the a Java Solr-recommender query
API it will be after experimenting a bit.
Can someone introduce me to Ellen and Tim?
On Sep 28, 2013, at 10:59 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
The one large-ish feature that I think would find general use would be a high
performance classifier trainer.
Flor cleanup sort of thing it would be good to fully integrate the streaming
k-means into the normal clustering commands while revamping the command line
API.
Dmitriy's recent scala work would help quite a bit before 1.0. Not sure it can
make 0.9.
For recommendations, I think that the demo system that pat started with the
elaborations by Ellen an Tim would be very good to have.
I would be happy to collaborate with somebody on these but am not at all likely
to have time to actually do them end to end.
Sent from my iPhone
On Sep 28, 2013, at 12:40, Grant Ingersoll <gsing...@apache.org> wrote:
Moving closer to 1.0, removing cruft, etc. Do we have any more major features
planned for 1.0? I think we said during 0.8 that we would try to follow pretty
quickly w/ another release.
-Grant
On Sep 28, 2013, at 12:33 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
Sounds right in principle but perhaps a bit soon.
What would define the release?
Sent from my iPhone
On Sep 27, 2013, at 7:48, Grant Ingersoll <gsing...@apache.org> wrote:
Anyone interested in thinking about 0.9 in the early Nov. time frame?
-Grant
--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com