Hey Dean,
what do you mean by character n-grams? If you mean things like ab or ui2
then given that there are so few characters compared to words is there a
problem that can't be solved without a look-up table for ny (where y 4ish )
Or are you looking at y 4 ish because if so then do you run
Hi Suneel,
On 9 October 2013 14:27, Suneel Marthi suneel_mar...@yahoo.com wrote:
an example of a Naive-Bayes classifier trained on character n-grams is the
LangDetect library.
(see http://code.google.com/p/language-detection/)
Agree with Ted that it should be relatively easy to build one.
Hi Si,
On 10 October 2013 07:59, simon.2.thomp...@bt.com wrote:
what do you mean by character n-grams? If you mean things like ab or
ui2 then given that there are so few characters compared to words is
there a problem that can't be solved without a look-up table for ny (where
y 4ish )
Or are
For language detection, you are going to have a hard time doing better than
one of the standard packages for the purpose. See here:
http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html
On Thu, Oct 10, 2013 at 1:01 AM, Dean Jones dean.m.jo...@gmail.com wrote:
Hi Si,
On 10 October 2013 12:46, Ted Dunning ted.dunn...@gmail.com wrote:
For language detection, you are going to have a hard time doing better than
one of the standard packages for the purpose. See here:
http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html
Thanks for
Cool. Sounds like you are ahead of the game.
Sent from my iPhone
On Oct 10, 2013, at 13:15, Dean Jones dean.m.jo...@gmail.com wrote:
On 10 October 2013 12:46, Ted Dunning ted.dunn...@gmail.com wrote:
For language detection, you are going to have a hard time doing better than
one of the
Dean,
Just a thought.
You should be able to create new language models (with LangDetect) if there's
Wikipedia content for the specific language,
had to do it in the past for Pashto and Malaysian.
On Thursday, October 10, 2013 8:16 AM, Dean Jones dean.m.jo...@gmail.com
wrote:
On 10
The issue of offline tests is often misunderstood I suspect. While I agree with
Ted it might do to explain a bit.
For myself I'd say offline testing is a requirement but not for comparing two
disparate recommenders. Companies like Amazon and Netflix, as well as others on
record, have a