Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Jan Schreiber
Dominique, thanks for taking the trouble to test it, etc. >From my POV, the upshot of the discussion *so far* is that we should not split the grammar files, even though some of them are getting quite large. Correct me if I'm wrong. For me (on a six years old cheap computer), there is no problem to

Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Jan Schreiber
Marcin: > Classifying some of the words semantically might be really useful for > some rules. Indeed, I could not agree more. The most difficult part would be coming up with the semantic categories in a way that is not completely ad hoc. Everyone who has ever used a public library is probably awa

Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] trunk/JLanguageTool/src/rules/de/grammar. xml

2012-05-16 Thread Daniel Naber
On Montag, 14. Mai 2012, Ruud Baars wrote: > Don't bother converting the Dutch xml. > I have already manually done that. > > Have to find the time to download the snapshot and get it tested. Just send it when you're ready - I have applied the automatic conversion to Dutch for now so I can remov

Re: [Languagetool] Any advice?

2012-05-16 Thread Marcin Miłkowski
W dniu 2012-05-16 22:28, gulp21 pisze: >> As much as I hate passing the buck, I'm afraid writing such a rule is >> beyond my (pretty much non-existent) Java skills. > > I planned to write a Java rule for it, but I'm rather busy at the > moment. Unless somebody is quicker than me, or has a better id

Re: [Languagetool] Apache OpenOffice 3.4 and LT

2012-05-16 Thread Daniel Naber
On Samstag, 12. Mai 2012, Daniel Naber wrote: > Has anybody tried LT with the recently released Apache OpenOffice.org? > It works for me but the freeze-on-startup problem is worse than ever, > it freezes 45 seconds for me (compared to 4 seconds with the latest > LibreOffice). It turns out the pr

Re: [Languagetool] Any advice?

2012-05-16 Thread gulp21
> As much as I hate passing the buck, I'm afraid writing such a rule is > beyond my (pretty much non-existent) Java skills. I planned to write a Java rule for it, but I'm rather busy at the moment. Unless somebody is quicker than me, or has a better idea, I'll start working on it in few weeks.

Re: [Languagetool] Any advice?

2012-05-16 Thread Jan Schreiber
This is excellent news! It pretty much answers a question I was going to ask on this list about automating part of the rule-creation process. With the recently updated http://languagetool.svn.sourceforge.net/viewvc/languagetool/trunk/JLanguageTool/src/resource/de/words-similar.txt and the Wikipedi

Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Marcin Miłkowski
W dniu 2012-05-16 20:10, Jan Schreiber pisze: > BTW, it should be possible to store at least those entities outside the > file itself, but I don't know how. --Jan Well, I had a look and it seems that you are using some of the entities to define fairly long regular expressions (disjunctions). Thi

Re: [Languagetool] Any advice?

2012-05-16 Thread Jan Schreiber
gulp21: > As there are many rules of that type, I would suggest that a general > WrongWordInContext-java-rules is created, because having many xml-rules > which only differ in the list of words seems to be absurd. I'm pretty sure that would help a lot, especially since Juan pointed out that the

Re: [Languagetool] Any advice?

2012-05-16 Thread Marcin Miłkowski
Hi all, Actually, word confusion is one area where a lot of experiments were made. I also made an experiment with Brill tagger and it worked really fine with English. It should be easy with Dutch as well: http://marcinmilkowski.pl/downloads/automating_rules_full.pdf Unfortunately, the process

Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Dominique Pellé
Jan Schreiber > I know that ridiculously huge file is a bit of a problem Are they? grammar.xml files are not that big. A text editor opens the biggest grammar.xml in a blink on my 5 years old laptop. To make it easier to navigate when editing, I define folds in Vim with a modeline (see comment

Re: [Languagetool] Any advice?

2012-05-16 Thread Juan Martorell
The problem you set out is entirely semantic and common to all languages. There is no way to distinguish both verbs but semantically. That introduces a new category for comparison, perhaps category trees and IMHO that would overwhelm the scope of the project. A possible shortcut is introducing sem

Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Jan Schreiber
Second thoughts: otoh, the earlier we do it, the less work will it be. I definitely agree that a file size of more than 1 MB is not very good. I wrote: > Daniel Naber wrote: >> But what about splitting up that file into its categories? We could have >> 5-10 smaller files rather than one large one

Re: [Languagetool] Any advice?

2012-05-16 Thread gulp21
There are some German rules which detect a word which is used in the wrong context, e.g. "Miene" (facial expression) and "Mine" (mine, lead). There is a list of words which are often used with "Miene" (verziehen, aufsetzen, gekränkt etc.), and words which are often used with "Mine" (explodieren

Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Jan Schreiber
Daniel Naber wrote: > Feel free to do that, although some new spaces might be re-introduced as I > cannot set up my IDE for spaces/tabs on a per-project basis. Then let's forget that. If there is one thing on earth that I can't stand it's a mixture of spaces and tabs. It visually messes up indent

Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Daniel Naber
On Mittwoch, 16. Mai 2012, Jan Schreiber wrote: > One tiny thing is still bugging me: Since the file is so long, the > change in indentation (four spaces rather than two) results in a > noticeable increase of the file size. We could avoid this by using tabs > instead of spaces for indentation, tha

[Languagetool] Any advice?

2012-05-16 Thread R.J. Baars
There is quite a bit of word confusing going on in Dutch. An example: geplant (planted) versus gepland (planned). This is not a grammatical issue, but actually using the wrong word, thereby altering the intention of the sentence. Neitehr is wrong. Both are very common. Nevertheless, a warning is

Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Jan Schreiber
Daniel Naber wrote: > I tried another conversion, please let me know if this is okay now. > > Regards > Daniel > Everything seems okay now, thanks. I made a few trivial cosmetic changes to the German grammar file though. One tiny thing is still bugging me: Since the file is so long, the chang