Re: Current limitations of MorfologikSpeller

2014-09-02 Thread R.J. Baars
The Dutch tokenizer is a little bit different from thet otheres, because of words with a ' in it. That works fine, unless the text does not have a ', but a ’ , which happens quite often. Since I am not able to edit the java program (little knowledge), could someone have a look at this please?

Sorry, wrong thread, should be : Dutch tokenizer.

2014-09-02 Thread R.J. Baars
The Dutch tokenizer is a little bit different from thet otheres, because of words with a ' in it. That works fine, unless the text does not have a ', but a ’ , which happens quite often. Since I am not able to edit the java program (little knowledge), could someone have a look at this

French names list

2014-09-02 Thread R.J. Baars
Thanks a lot for the tip. I am including it in the Dutch rules. There is some work in translating the messages. Maybe it is possible to standardize the examples (not the messages, these are significat for users), so translation is easier. Ruud

Re: French names list

2014-09-02 Thread R.J. Baars
I did part of the work while moving to Dutch. I removed some rules, because in the Netherlands, we don translate or transform names of fantasy people normally, and soms names were really quite local ;-) It is quite easy now to translate to any language. I could start by translating it to English,

WMF Individual Engagement Grants

2014-09-02 Thread Daniel Naber
FYI, the Wikimedia Foundation welcomes proposals for their 'Individual Engagement Grants'. If you want to develop the LT Wikipedia integration, this might be interesting to you: https://meta.wikimedia.org/wiki/Grants:IEG For example, one might extend LT WikiCheck to do spell checking on

Re: Current limitations of MorfologikSpeller

2014-09-02 Thread Andriy Rysin
In UkrainianWordTokenizer.java I am replacing Unicode apostrophes U+2019 and U+02BC into old good single quote (') to unify all apostrophe handling. If Dutch case is similar you could borrow this code. Andriy On 09/02/2014 08:11 AM, R.J. Baars wrote: The Dutch tokenizer is a little bit

Bug is disambiguator?

2014-09-02 Thread Dominique Pellé
Hi Have a look in the following debug output of LanguageTool where a token gets non-sensical POS tag N.* (multiple times) after a disambiguation rule is applied. Is it a bug in the disambiguator? Or am writing an incorrect disambiguation rule? $ echo An eil| java -jar

Re: Current limitations of MorfologikSpeller

2014-09-02 Thread R.J. Baars
I could, If I were able to code. I only do things on the XML level. Ruud In UkrainianWordTokenizer.java I am replacing Unicode apostrophes U+2019 and U+02BC into old good single quote (') to unify all apostrophe handling. If Dutch case is similar you could borrow this code. Andriy On