Hi Say I want to detect invalid use of word "a" (= has, verb) instead of "à" (= at, preposition) in many French expressions such as:
a nouveau -> à nouveau a plein temps -> à plein temps a rude épreuve -> à rude épreuve a vol d'oiseau -> à vol d'oiseau etc. I wish I could write a rule pattern like this: <rule> <pattern> <marker><token>a</token></marker> <tokens>plein temps#chaque fois#rude épreuve#vol d’oiseau</tokens> </pattern> ... </rule> Notice the <tokens> tag, with an 's' instead of <token>. The # character and space characters inside <tokens>...#...#...</tokens> would be automatically interpreted in such a way that the above rule is equivalent to much more verbose set of rules: <rule> <pattern> <marker><token>a</token></marker> <token>plein</tokens> <token>temps</tokens> </pattern> </rule> <rule> <pattern> <marker><token>a</token></marker> <token>chaque</tokens> <token>fois</tokens> </pattern> </rule> <rule> <pattern> <marker><token>a</token></marker> <token>rude</tokens> <token>épreuve</tokens> </pattern> </rule> <rule> <pattern> <marker><token>a</token></marker> <token>vol</tokens> <token>d</tokens> <token>’</tokens> <token>oiseau</tokens> </pattern> </rule> In other words: * each # character inside <tokens>...#...#...</tokens> creates a new <rule>. * And the spaces inside <tokens>...</token>> causes automatic tokenization so that something like <tokens>rude épreuve</tokens> is automatically interpreted as <token>rude</token><token>épreuve</token>. I'm curious whether rule maintainers would find it useful. Regards Dominique
------------------------------------------------------------------------------
_______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel