Dominique, thanks for taking the trouble to test it, etc.
>From my POV, the upshot of the discussion *so far* is that we should not
split the grammar files, even though some of them are getting quite
large. Correct me if I'm wrong. For me (on a six years old cheap
computer), there is no problem to
Marcin:
> Classifying some of the words semantically might be really useful for
> some rules.
Indeed, I could not agree more. The most difficult part would be coming
up with the semantic categories in a way that is not completely ad hoc.
Everyone who has ever used a public library is probably awa
On Montag, 14. Mai 2012, Ruud Baars wrote:
> Don't bother converting the Dutch xml.
> I have already manually done that.
>
> Have to find the time to download the snapshot and get it tested.
Just send it when you're ready - I have applied the automatic conversion to
Dutch for now so I can remov
W dniu 2012-05-16 22:28, gulp21 pisze:
>> As much as I hate passing the buck, I'm afraid writing such a rule is
>> beyond my (pretty much non-existent) Java skills.
>
> I planned to write a Java rule for it, but I'm rather busy at the
> moment. Unless somebody is quicker than me, or has a better id
On Samstag, 12. Mai 2012, Daniel Naber wrote:
> Has anybody tried LT with the recently released Apache OpenOffice.org?
> It works for me but the freeze-on-startup problem is worse than ever,
> it freezes 45 seconds for me (compared to 4 seconds with the latest
> LibreOffice).
It turns out the pr
> As much as I hate passing the buck, I'm afraid writing such a rule is
> beyond my (pretty much non-existent) Java skills.
I planned to write a Java rule for it, but I'm rather busy at the
moment. Unless somebody is quicker than me, or has a better idea, I'll
start working on it in few weeks.
This is excellent news! It pretty much answers a question I was going to
ask on this list about automating part of the rule-creation process.
With the recently updated
http://languagetool.svn.sourceforge.net/viewvc/languagetool/trunk/JLanguageTool/src/resource/de/words-similar.txt
and the Wikipedi
W dniu 2012-05-16 20:10, Jan Schreiber pisze:
> BTW, it should be possible to store at least those entities outside the
> file itself, but I don't know how. --Jan
Well, I had a look and it seems that you are using some of the entities
to define fairly long regular expressions (disjunctions). Thi
gulp21:
> As there are many rules of that type, I would suggest that a general
> WrongWordInContext-java-rules is created, because having many xml-rules
> which only differ in the list of words seems to be absurd.
I'm pretty sure that would help a lot, especially since Juan pointed out
that the
Hi all,
Actually, word confusion is one area where a lot of experiments were
made. I also made an experiment with Brill tagger and it worked really
fine with English. It should be easy with Dutch as well:
http://marcinmilkowski.pl/downloads/automating_rules_full.pdf
Unfortunately, the process
Jan Schreiber
> I know that ridiculously huge file is a bit of a problem
Are they? grammar.xml files are not that big.
A text editor opens the biggest grammar.xml in a blink on
my 5 years old laptop.
To make it easier to navigate when editing, I define folds in Vim
with a modeline (see comment
The problem you set out is entirely semantic and common to all languages.
There is no way to distinguish both verbs but semantically. That introduces
a new category for comparison, perhaps category trees and IMHO that would
overwhelm the scope of the project.
A possible shortcut is introducing sem
Second thoughts: otoh, the earlier we do it, the less work will it be. I
definitely agree that a file size of more than 1 MB is not very good.
I wrote:
> Daniel Naber wrote:
>> But what about splitting up that file into its categories? We could have
>> 5-10 smaller files rather than one large one
There are some German rules which detect a word which is used in the
wrong context, e.g. "Miene" (facial expression) and "Mine" (mine, lead).
There is a list of words which are often used with "Miene" (verziehen,
aufsetzen, gekränkt etc.), and words which are often used with "Mine"
(explodieren
Daniel Naber wrote:
> Feel free to do that, although some new spaces might be re-introduced as I
> cannot set up my IDE for spaces/tabs on a per-project basis.
Then let's forget that. If there is one thing on earth that I can't
stand it's a mixture of spaces and tabs. It visually messes up
indent
On Mittwoch, 16. Mai 2012, Jan Schreiber wrote:
> One tiny thing is still bugging me: Since the file is so long, the
> change in indentation (four spaces rather than two) results in a
> noticeable increase of the file size. We could avoid this by using tabs
> instead of spaces for indentation, tha
There is quite a bit of word confusing going on in Dutch. An example:
geplant (planted) versus gepland (planned).
This is not a grammatical issue, but actually using the wrong word,
thereby altering the intention of the sentence.
Neitehr is wrong. Both are very common. Nevertheless, a warning is
Daniel Naber wrote:
> I tried another conversion, please let me know if this is okay now.
>
> Regards
> Daniel
>
Everything seems okay now, thanks. I made a few trivial cosmetic changes
to the German grammar file though.
One tiny thing is still bugging me: Since the file is so long, the
chang
18 matches
Mail list logo