Dean,
Just a thought.
You should be able to create new language models (with LangDetect) if there's
Wikipedia content for the specific language,
had to do it in the past for Pashto and Malaysian.
On Thursday, October 10, 2013 8:16 AM, Dean Jones
wrote:
On 10 October 2013 12:46, Ted Du
Cool. Sounds like you are ahead of the game.
Sent from my iPhone
On Oct 10, 2013, at 13:15, Dean Jones wrote:
> On 10 October 2013 12:46, Ted Dunning wrote:
>> For language detection, you are going to have a hard time doing better than
>> one of the standard packages for the purpose. See he
On 10 October 2013 12:46, Ted Dunning wrote:
> For language detection, you are going to have a hard time doing better than
> one of the standard packages for the purpose. See here:
>
> http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html
>
Thanks for the pointer Ted. I
For language detection, you are going to have a hard time doing better than
one of the standard packages for the purpose. See here:
http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html
On Thu, Oct 10, 2013 at 1:01 AM, Dean Jones wrote:
> Hi Si,
>
> On 10 October 201
Hi Si,
On 10 October 2013 07:59, wrote:
>
> what do you mean by character n-grams? If you mean things like "&ab" or
"ui2" then given that there are so few characters compared to words is
there a problem that can't be solved without a look-up table for n
> Or are you looking at y >4 ish because if
Hi Suneel,
On 9 October 2013 14:27, Suneel Marthi wrote:
> an example of a Naive-Bayes classifier trained on character n-grams is the
> LangDetect library.
> (see http://code.google.com/p/language-detection/)
>
> Agree with Ted that it should be relatively easy to build one.
>
Thanks. Yes, I ne
Hey Dean,
what do you mean by character n-grams? If you mean things like "&ab" or "ui2"
then given that there are so few characters compared to words is there a
problem that can't be solved without a look-up table for n4 ish because if so then do you run into the issue of
a sudden space explosi
an example of a Naive-Bayes classifier trained on character n-grams is the
LangDetect library.
(see http://code.google.com/p/language-detection/)
Agree with Ted that it should be relatively easy to build one.
On Wednesday, October 9, 2013 6:40 AM, Ted Dunning
wrote:
Yes. Should work to
Hi Dean,
i might be wrong. but try googling for "shingling"... could be something to
start with.
Cheers
Jens
2013/10/9 Ted Dunning
> Yes. Should work to use character n-grams. There are oddities in the
> stats because the different n-grams are not independent, but Naive Bayes
> methods are
Yes. Should work to use character n-grams. There are oddities in the
stats because the different n-grams are not independent, but Naive Bayes
methods are in such a state of sin that it shouldn't hurt any worse.
No... I don't think that there is a capability built in to generate the
character n-g
10 matches
Mail list logo