Hi

Say I want to detect invalid use of word "a" (= has, verb)
instead of "à" (= at, preposition) in many French expressions
such as:

   a nouveau -> à nouveau
   a plein temps -> à plein temps
   a rude épreuve -> à rude épreuve
   a vol d'oiseau -> à vol d'oiseau
   etc.

I wish I could write a rule pattern like this:

  <rule>
    <pattern>
      <marker><token>a</token></marker>
      <tokens>plein temps#chaque fois#rude épreuve#vol d’oiseau</tokens>
    </pattern>
    ...
  </rule>

Notice the <tokens> tag, with an 's' instead of <token>.
The # character and space characters inside <tokens>...#...#...</tokens>
would be automatically interpreted in such a way that the above rule
is equivalent to much more verbose set of rules:

  <rule>
    <pattern>
      <marker><token>a</token></marker>
      <token>plein</tokens>
      <token>temps</tokens>
    </pattern>
  </rule>
  <rule>
    <pattern>
      <marker><token>a</token></marker>
      <token>chaque</tokens>
      <token>fois</tokens>
    </pattern>
  </rule>
  <rule>
    <pattern>
      <marker><token>a</token></marker>
      <token>rude</tokens>
      <token>épreuve</tokens>
    </pattern>
  </rule>
  <rule>
    <pattern>
      <marker><token>a</token></marker>
      <token>vol</tokens>
      <token>d</tokens>
      <token>’</tokens>
      <token>oiseau</tokens>
    </pattern>
  </rule>

In other words:
* each # character inside <tokens>...#...#...</tokens> creates
  a new <rule>.
* And the spaces inside <tokens>...</token>> causes automatic
  tokenization so that something like <tokens>rude épreuve</tokens>
  is automatically interpreted as <token>rude</token><token>épreuve</token>.

I'm curious whether rule maintainers would find it useful.

Regards
Dominique
------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to