Hello folks,
I see that it's possible to use mahout to train a naive bayes
classifier using n-grams as features (or I guess, strictly speaking,
mahout can be used to generate sequence files containing n-grams; I
suspect the naive bayes trainer is indifferent to the form of features
it trains on).
Yes. Should work to use character n-grams. There are oddities in the
stats because the different n-grams are not independent, but Naive Bayes
methods are in such a state of sin that it shouldn't hurt any worse.
No... I don't think that there is a capability built in to generate the
character
Hi Dean,
i might be wrong. but try googling for shingling... could be something to
start with.
Cheers
Jens
2013/10/9 Ted Dunning ted.dunn...@gmail.com
Yes. Should work to use character n-grams. There are oddities in the
stats because the different n-grams are not independent, but Naive
Just to add a note of encouragement for the idea of better integration
between Mahout and Solr:
On safariflow.com, we've recently converted our recommender, which
computes similarity scores w/Mahout, from storing scores and running
queries w/Postgres, to doing all that in Solr. It's been a
an example of a Naive-Bayes classifier trained on character n-grams is the
LangDetect library.
(see http://code.google.com/p/language-detection/)
Agree with Ted that it should be relatively easy to build one.
On Wednesday, October 9, 2013 6:40 AM, Ted Dunning ted.dunn...@gmail.com
wrote:
Mike,
Thanks for the vote of confidence!
On Wed, Oct 9, 2013 at 6:13 AM, Michael Sokolov
msoko...@safaribooksonline.com wrote:
Just to add a note of encouragement for the idea of better integration
between Mahout and Solr:
On safariflow.com, we've recently converted our recommender, which
Solr uses cosine similarity for it's queries. The implementation on github uses
Mahout LLR for calculating the item-item similarity matrix but when you do the
more-like-this query at runtime Solr uses cosine. This can be fixed in Solr,
not sure how much work.
It sounds like you are doing
On 10/9/13 3:08 PM, Pat Ferrel wrote:
Solr uses cosine similarity for it's queries. The implementation on github uses
Mahout LLR for calculating the item-item similarity matrix but when you do the
more-like-this query at runtime Solr uses cosine. This can be fixed in Solr,
not sure how much
1) Using the user history for the current user in a more-like-this query
against the item-item similarity matrix will produce a user-history based
recommendation. Simply fetching the item-item history row for a particular item
will give you the item-similarity based recs with no account of user
On Wed, Oct 9, 2013 at 12:54 PM, Michael Sokolov
msoko...@safaribooksonline.com wrote:
On 10/9/13 3:08 PM, Pat Ferrel wrote:
Solr uses cosine similarity for it's queries. The implementation on
github uses Mahout LLR for calculating the item-item similarity matrix but
when you do the
On Wed, Oct 9, 2013 at 2:07 PM, Pat Ferrel p...@occamsmachete.com wrote:
2) What you are doing is something else that I was calling a shopping-cart
recommender. You are using the item-set in the current cart and finding
similar, what, items? A different way to tackle this is to store all other
On Wed, Oct 9, 2013 at 12:54 PM, Michael Sokolov
msoko...@safaribooksonline.com wrote:
It sounds like you are doing item-item similarities for recommendations,
not actually calculating user-history based recs, is that true?
Yes that's true so far. Our recommender system has the ability to
12 matches
Mail list logo