Just to add a note of encouragement for the idea of better integration
between Mahout and Solr:
On safariflow.com, we've recently converted our recommender, which
computes similarity scores w/Mahout, from storing scores and running
queries w/Postgres, to doing all that in Solr. It's been a
-recommender-system-evaluation-a-quantitative-literature-survey/
but that's no excuse not to do better. I'll certainly share when I know
more :)
-Mike
On Oct 9, 2013, at 6:13 AM, Michael Sokolov msoko...@safaribooksonline.com
wrote:
Just to add a note of encouragement for the idea of better
On 7/23/13 7:26 PM, Pat Ferrel wrote:
Honestly not trying to make this more complicated but…
From past experience I strongly suspect item similarity rank is not something
we want to lose so unless someone has a better idea I'll just order the IDs in
the fields and call it good for now.
On 07/22/2013 12:20 PM, Pat Ferrel wrote:
My understanding of the Solr proposal puts B's row similarity matrix in a vector per
item. That means each row is turned into terms = external IDs--not sure how
the weights of each term are encoded.
This is the key question for me. The best idea I've
, Jul 22, 2013 at 11:07 AM, Michael Sokolov
msoko...@safaribooksonline.com wrote:
On 07/22/2013 12:20 PM, Pat Ferrel wrote:
My understanding of the Solr proposal puts B's row similarity matrix in a
vector per item. That means each row is turned into terms = external
IDs--not sure how the weights
:18 PM, Michael Sokolov
msoko...@safaribooksonline.com
mailto:msoko...@safaribooksonline.com wrote:
So you are proposing just grabbing the top N scoring related items
and indexing listing them without regard to weight? Effectively
quantizing the weights to = 1, and 0 for everything
user, use their training data as
input to the recommender, and see if it recommends the data in the
evaluation set or not.
Is this what the precision/recall test is actually doing?
--
Michael Sokolov
Senior Architect
Safari Books Online
is as good as the next. In the framework I think it just
randomly picks a subset of the data. You could also split by time;
that's a defensible way to do it. Training data up to time t and test
data after time t.
On Fri, Jun 7, 2013 at 7:51 PM, Michael Sokolov
msoko...@safaribooksonline.com wrote