Naive bayes and character n-grams

2013-10-09 Thread Dean Jones
Hello folks, I see that it's possible to use mahout to train a naive bayes classifier using n-grams as features (or I guess, strictly speaking, mahout can be used to generate sequence files containing n-grams; I suspect the naive bayes trainer is indifferent to the form of features it trains on).

Re: Naive bayes and character n-grams

2013-10-09 Thread Ted Dunning
Yes. Should work to use character n-grams. There are oddities in the stats because the different n-grams are not independent, but Naive Bayes methods are in such a state of sin that it shouldn't hurt any worse. No... I don't think that there is a capability built in to generate the character

Re: Naive bayes and character n-grams

2013-10-09 Thread Jens Bonerz
Hi Dean, i might be wrong. but try googling for shingling... could be something to start with. Cheers Jens 2013/10/9 Ted Dunning ted.dunn...@gmail.com Yes. Should work to use character n-grams. There are oddities in the stats because the different n-grams are not independent, but Naive

Re: Solr-recommender

2013-10-09 Thread Michael Sokolov
Just to add a note of encouragement for the idea of better integration between Mahout and Solr: On safariflow.com, we've recently converted our recommender, which computes similarity scores w/Mahout, from storing scores and running queries w/Postgres, to doing all that in Solr. It's been a

Re: Naive bayes and character n-grams

2013-10-09 Thread Suneel Marthi
an example of a Naive-Bayes classifier trained on character n-grams is the LangDetect library. (see http://code.google.com/p/language-detection/) Agree with Ted that it should be relatively easy to build one. On Wednesday, October 9, 2013 6:40 AM, Ted Dunning ted.dunn...@gmail.com wrote:

Re: Solr-recommender

2013-10-09 Thread Ted Dunning
Mike, Thanks for the vote of confidence! On Wed, Oct 9, 2013 at 6:13 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: Just to add a note of encouragement for the idea of better integration between Mahout and Solr: On safariflow.com, we've recently converted our recommender, which

Re: Solr-recommender

2013-10-09 Thread Pat Ferrel
Solr uses cosine similarity for it's queries. The implementation on github uses Mahout LLR for calculating the item-item similarity matrix but when you do the more-like-this query at runtime Solr uses cosine. This can be fixed in Solr, not sure how much work. It sounds like you are doing

Re: Solr-recommender

2013-10-09 Thread Michael Sokolov
On 10/9/13 3:08 PM, Pat Ferrel wrote: Solr uses cosine similarity for it's queries. The implementation on github uses Mahout LLR for calculating the item-item similarity matrix but when you do the more-like-this query at runtime Solr uses cosine. This can be fixed in Solr, not sure how much

Re: Solr-recommender

2013-10-09 Thread Pat Ferrel
1) Using the user history for the current user in a more-like-this query against the item-item similarity matrix will produce a user-history based recommendation. Simply fetching the item-item history row for a particular item will give you the item-similarity based recs with no account of user

Re: Solr-recommender

2013-10-09 Thread Ted Dunning
On Wed, Oct 9, 2013 at 12:54 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: On 10/9/13 3:08 PM, Pat Ferrel wrote: Solr uses cosine similarity for it's queries. The implementation on github uses Mahout LLR for calculating the item-item similarity matrix but when you do the

Re: Solr-recommender

2013-10-09 Thread Ted Dunning
On Wed, Oct 9, 2013 at 2:07 PM, Pat Ferrel p...@occamsmachete.com wrote: 2) What you are doing is something else that I was calling a shopping-cart recommender. You are using the item-set in the current cart and finding similar, what, items? A different way to tackle this is to store all other

Re: Solr-recommender

2013-10-09 Thread Ted Dunning
On Wed, Oct 9, 2013 at 12:54 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: It sounds like you are doing item-item similarities for recommendations, not actually calculating user-history based recs, is that true? Yes that's true so far. Our recommender system has the ability to