The Dutch tokenizer is a little bit different from thet otheres, because
of words with a ' in it.
That works fine, unless the text does not have a ', but a , which
happens quite often.
Since I am not able to edit the java program (little knowledge), could
someone have a look at this please?
The Dutch tokenizer is a little bit different from thet otheres, because
of words with a ' in it.
That works fine, unless the text does not have a ', but a , which
happens quite often.
Since I am not able to edit the java program (little knowledge), could
someone have a look at this
Thanks a lot for the tip. I am including it in the Dutch rules.
There is some work in translating the messages.
Maybe it is possible to standardize the examples (not the messages, these
are significat for users), so translation is easier.
Ruud
I did part of the work while moving to Dutch. I removed some rules,
because in the Netherlands, we don translate or transform names of fantasy
people normally, and soms names were really quite local ;-)
It is quite easy now to translate to any language. I could start by
translating it to English,
FYI, the Wikimedia Foundation welcomes proposals for their 'Individual
Engagement Grants'. If you want to develop the LT Wikipedia integration,
this might be interesting to you:
https://meta.wikimedia.org/wiki/Grants:IEG
For example, one might extend LT WikiCheck to do spell checking on
In UkrainianWordTokenizer.java I am replacing Unicode apostrophes
U+2019 and U+02BC into old good single quote (') to unify all apostrophe
handling. If Dutch case is similar you could borrow this code.
Andriy
On 09/02/2014 08:11 AM, R.J. Baars wrote:
The Dutch tokenizer is a little bit
Hi
Have a look in the following debug output
of LanguageTool where a token gets non-sensical
POS tag N.* (multiple times) after a disambiguation
rule is applied.
Is it a bug in the disambiguator?
Or am writing an incorrect disambiguation rule?
$ echo An eil| java -jar
I could, If I were able to code. I only do things on the XML level.
Ruud
In UkrainianWordTokenizer.java I am replacing Unicode apostrophes
U+2019 and U+02BC into old good single quote (') to unify all apostrophe
handling. If Dutch case is similar you could borrow this code.
Andriy
On