Re: spell checker enhancement

2014-09-16 Thread R.Baars
Okay, thanks. Good to know this. This is however not the time to do that; currently, there is a lot more to do to make what is already in the Dutch LT of better quality. I will keep it in mind for later, when I have a more clear view of remaining spelling issues. (Currently 10% of the collected

Re: spell checker enhancement

2014-09-16 Thread Jaume Ortolà i Font
2014-09-16 14:43 GMT+02:00 R.Baars : > How is that done? > > Ruud > > Do you mean ignoring tagged words in spellchecking (even if they are not in the dictionary)? It's a configurable option of the speller (at least in the Morfologik speller rule). A line of Java code. Jaume > > Op 16-09-14

Re: spell checker enhancement

2014-09-16 Thread R.Baars
How is that done? Ruud Op 16-09-14 om 13:23 schreef Jaume Ortolà i Font: 2014-09-16 13:03 GMT+02:00 R.Baars >: I see. This is probably of no use for spellchecking, but it is for postagging. It gives no suggestions, but it can be used for avoiding false pos

Re: spell checker enhancement

2014-09-16 Thread R.J. Baars
Since we are discussing spell checking, I would like to add another thought. Spell checking just knows if a words is either 'in the list' or 'not in the list'. This is interpreted as 'known to be correctly spelled' vs 'might be a spelling mistake'. There is another status: 'known to be wrong' (kn

Re: spell checker enhancement

2014-09-16 Thread Jaume Ortolà i Font
2014-09-16 13:03 GMT+02:00 R.Baars : > I see. This is probably of no use for spellchecking, but it is for > postagging. > > It gives no suggestions, but it can be used for avoiding false positives in spellchecking, if you set that tagged words are to be ignored. > > Does > Abu Dhabi NPCNG00 > c

Re: spell checker enhancement

2014-09-16 Thread R.Baars
I see. This is probably of no use for spellchecking, but it is for postagging. Does Abu Dhabi NPCNG00 cause both words to be tagged with that tag, or are they considered 1 token with that postag? (Might come in handy for just this tagging..) Ruud Op 16-09-14 om 12:56 schreef Jaume Ortolà i

Re: spell checker enhancement

2014-09-16 Thread Jaume Ortolà i Font
Hi, Ruud. I don't find any documentation. It is used in Polish, French, Catalan, Russian, Ukrainian and Spanish. Implementation: Enable it (Java). Create a "multiwords.txt" in your resources folder like these [1]. The tokens are separated by white space and the tag is separated by a tab. Result

Re: spell checker enhancement

2014-09-16 Thread R.Baars
Jaume, thanks, but I am not sure. Depends on its implementation I think. Where can I find more info? Ruud Op 16-09-14 om 12:26 schreef Jaume Ortolà i Font: 2014-09-16 11:21 GMT+02:00 R.J. Baars >: We don't agree. There is a spellchecker, but also a single word

Re: spell checker enhancement

2014-09-16 Thread Jaume Ortolà i Font
2014-09-16 11:21 GMT+02:00 R.J. Baars : > We don't agree. There is a spellchecker, but also a single word ignore > list for it. > There are XML rules, but also a Simplereplace rule, a compounding rule. > > So apart from the hammer and the screwdriver, there are more tools. > > There is indeed anot

Re: spell checker enhancement

2014-09-16 Thread R.Baars
I know it will be simple to generate ignore rule like this, And I will probably do that, as soon as they pop up in the frequency table. Ruud Op 16-09-14 om 12:01 schreef Marcin Miłkowski: > W dniu 2014-09-16 o 11:21, R.J. Baars pisze: >> Marcin, >> >> We don't agree. There is a spellchecker, but

Re: spell checker enhancement

2014-09-16 Thread Marcin Miłkowski
W dniu 2014-09-16 o 11:21, R.J. Baars pisze: > Marcin, > > We don't agree. There is a spellchecker, but also a single word ignore > list for it. Yes, but for multi-words, we'd have to use the disambiguator code internally anyway. You ask for yet another notation of the same thing. Notice also th

Re: spell checker enhancement

2014-09-16 Thread R.J. Baars
Marcin, We don't agree. There is a spellchecker, but also a single word ignore list for it. There are XML rules, but also a Simplereplace rule, a compounding rule. So apart from the hammer and the screwdriver, there are more tools. But anyway, adding the most frequent ones tot the disambiguator

Re: reminder: upcoming feature freeze

2014-09-16 Thread R.J. Baars
I will have to work hard then to get all errors from the dictionaries ... Ruud > Hi, > > just a reminder that feature freeze for LT 2.7 will start next week > (2014-09-22). The release is planned for 2014-09-29 > (http://wiki.languagetool.org/roadmap). > > Regards > Daniel > > > --

Re: Extra tokenizing char needed for Dutch

2014-09-16 Thread R.J. Baars
Great. Now I can build a rule around it. Ruud > On 2014-09-15 11:25, Marcin Miłkowski wrote: > >>> I tried adding that character (em dash) for all languages and all >>> tests >>> still work. Any objections to adding it directly to WordTokenizer so >>> that it affects all languages? >> >> I think

reminder: upcoming feature freeze

2014-09-16 Thread Daniel Naber
Hi, just a reminder that feature freeze for LT 2.7 will start next week (2014-09-22). The release is planned for 2014-09-29 (http://wiki.languagetool.org/roadmap). Regards Daniel -- Want excitement? Manually upgrade

Re: Extra tokenizing char needed for Dutch

2014-09-16 Thread Daniel Naber
On 2014-09-15 11:25, Marcin Miłkowski wrote: >> I tried adding that character (em dash) for all languages and all >> tests >> still work. Any objections to adding it directly to WordTokenizer so >> that it affects all languages? > > I think it should be added universally. I have done that now.

Re: spell checker enhancement

2014-09-16 Thread Marcin Miłkowski
W dniu 2014-09-16 o 09:03, R.J. Baars pisze: > A word like 'Aviv'is not correct unless 'Tel' is before it. > So it is best to leave Tel and Aviv out of the spell checker. > That results in spell checking reporting errors for Aviv. > > In the disambiguator, there is the option to block that, by maki

Re: Some advice needed

2014-09-16 Thread Marcin Miłkowski
W dniu 2014-09-16 o 06:25, Dominique Pellé pisze: > R.J. Baars wrote: > >> There is an official advice for Dutch, stating that for understandable >> reading, an average of no more than 12 words a sentence is required. >> >> Since I can only make rule per sentence, I made a rule, warning for >> sen

Re: Dutch section of languagetool.org

2014-09-16 Thread Daniel Naber
On 2014-09-15 19:42, R.J. Baars wrote: > Sorry, I forgot to add the file.. > >> For now, I translated and adjusted the languagetool.org/nl index page >> to >> work at least partly. It's online now at https://languagetool.org/nl/. The $title and $webstartText still need to be translated and I

spell checker enhancement

2014-09-16 Thread R.J. Baars
A word like 'Aviv'is not correct unless 'Tel' is before it. So it is best to leave Tel and Aviv out of the spell checker. That results in spell checking reporting errors for Aviv. In the disambiguator, there is the option to block that, by making an immunizing rule: Tel Avi