date:20140902

Re: Current limitations of MorfologikSpeller

2014-09-02 Thread R.J. Baars

The Dutch tokenizer is a little bit different from thet otheres, because of words with a ' in it. That works fine, unless the text does not have a ', but a , which happens quite often. Since I am not able to edit the java program (little knowledge), could someone have a look at this please?

Sorry, wrong thread, should be : Dutch tokenizer.

2014-09-02 Thread R.J. Baars

The Dutch tokenizer is a little bit different from thet otheres, because of words with a ' in it. That works fine, unless the text does not have a ', but a , which happens quite often. Since I am not able to edit the java program (little knowledge), could someone have a look at this

French names list

2014-09-02 Thread R.J. Baars

Thanks a lot for the tip. I am including it in the Dutch rules. There is some work in translating the messages. Maybe it is possible to standardize the examples (not the messages, these are significat for users), so translation is easier. Ruud

Re: French names list

2014-09-02 Thread R.J. Baars

I did part of the work while moving to Dutch. I removed some rules, because in the Netherlands, we don translate or transform names of fantasy people normally, and soms names were really quite local ;-) It is quite easy now to translate to any language. I could start by translating it to English,

WMF Individual Engagement Grants

2014-09-02 Thread Daniel Naber

FYI, the Wikimedia Foundation welcomes proposals for their 'Individual Engagement Grants'. If you want to develop the LT Wikipedia integration, this might be interesting to you: https://meta.wikimedia.org/wiki/Grants:IEG For example, one might extend LT WikiCheck to do spell checking on

Re: Current limitations of MorfologikSpeller

2014-09-02 Thread Andriy Rysin

In UkrainianWordTokenizer.java I am replacing Unicode apostrophes U+2019 and U+02BC into old good single quote (') to unify all apostrophe handling. If Dutch case is similar you could borrow this code. Andriy On 09/02/2014 08:11 AM, R.J. Baars wrote: The Dutch tokenizer is a little bit

Bug is disambiguator?

2014-09-02 Thread Dominique Pellé

Hi Have a look in the following debug output of LanguageTool where a token gets non-sensical POS tag N.* (multiple times) after a disambiguation rule is applied. Is it a bug in the disambiguator? Or am writing an incorrect disambiguation rule? $ echo An eil| java -jar

Re: Current limitations of MorfologikSpeller

2014-09-02 Thread R.J. Baars

I could, If I were able to code. I only do things on the XML level. Ruud In UkrainianWordTokenizer.java I am replacing Unicode apostrophes U+2019 and U+02BC into old good single quote (') to unify all apostrophe handling. If Dutch case is similar you could borrow this code. Andriy On

Re: Current limitations of MorfologikSpeller

Sorry, wrong thread, should be : Dutch tokenizer.

French names list

Re: French names list

WMF Individual Engagement Grants

Re: Current limitations of MorfologikSpeller

Bug is disambiguator?

Re: Current limitations of MorfologikSpeller

8 matches

Site Navigation

Mail list logo

Footer information