RE: Naive bayes and character n-grams

2013-10-10 Thread simon.2.thompson
Hey Dean, what do you mean by character n-grams? If you mean things like ab or ui2 then given that there are so few characters compared to words is there a problem that can't be solved without a look-up table for ny (where y 4ish ) Or are you looking at y 4 ish because if so then do you run

Re: Naive bayes and character n-grams

2013-10-10 Thread Dean Jones
Hi Suneel, On 9 October 2013 14:27, Suneel Marthi suneel_mar...@yahoo.com wrote: an example of a Naive-Bayes classifier trained on character n-grams is the LangDetect library. (see http://code.google.com/p/language-detection/) Agree with Ted that it should be relatively easy to build one.

Re: Naive bayes and character n-grams

2013-10-10 Thread Dean Jones
Hi Si, On 10 October 2013 07:59, simon.2.thomp...@bt.com wrote: what do you mean by character n-grams? If you mean things like ab or ui2 then given that there are so few characters compared to words is there a problem that can't be solved without a look-up table for ny (where y 4ish ) Or are

Re: Naive bayes and character n-grams

2013-10-10 Thread Ted Dunning
For language detection, you are going to have a hard time doing better than one of the standard packages for the purpose. See here: http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html On Thu, Oct 10, 2013 at 1:01 AM, Dean Jones dean.m.jo...@gmail.com wrote: Hi Si,

Re: Naive bayes and character n-grams

2013-10-10 Thread Dean Jones
On 10 October 2013 12:46, Ted Dunning ted.dunn...@gmail.com wrote: For language detection, you are going to have a hard time doing better than one of the standard packages for the purpose. See here: http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html Thanks for

Re: Naive bayes and character n-grams

2013-10-10 Thread Ted Dunning
Cool. Sounds like you are ahead of the game. Sent from my iPhone On Oct 10, 2013, at 13:15, Dean Jones dean.m.jo...@gmail.com wrote: On 10 October 2013 12:46, Ted Dunning ted.dunn...@gmail.com wrote: For language detection, you are going to have a hard time doing better than one of the

Re: Naive bayes and character n-grams

2013-10-10 Thread Suneel Marthi
Dean, Just a thought. You should be able to create new language models (with LangDetect) if there's Wikipedia content for the specific language, had to do it in the past for Pashto and Malaysian. On Thursday, October 10, 2013 8:16 AM, Dean Jones dean.m.jo...@gmail.com wrote: On 10

Re: Solr-recommender

2013-10-10 Thread Pat Ferrel
The issue of offline tests is often misunderstood I suspect. While I agree with Ted it might do to explain a bit. For myself I'd say offline testing is a requirement but not for comparing two disparate recommenders. Companies like Amazon and Netflix, as well as others on record, have a