Hi,

yes this example won't work without changes, because the word list is
sensitive to white spaces, e.g., you distinguish between "n.C." and "n.
C.". I know this sound like a bug, but it is rather a feature.

In order to solve your problem you could either remove all spaces in
your word list, you could add "n.Chr." and "v.Chr." (without space) to
your word list, or you could retain the spaces before calling MARKFAST
(Document{-> RETAINTYPE(SPACE)};)

The short explanation for this is that the action and the word list
won't see any spaces with the default filtering settings, thus they
check on a candidate like "n.Chr". However, in the trie, there is no "h"
in that path without space before the "C".

Best,

Peter

On 22.05.2013 10:52, armin.weg...@bka.bund.de wrote:
> Hi Peter,
>
> your example does work perfectly fine. But try this as word list and input 
> document:
>
> nach Christus
> nach der Zeitenwende
> n. C.
> n.C.
> nC.
> n. Chr.
> n. d. Z.
> n.d.Z.
> unserer Zeit
> unserer Zeitrechnung
> u. Z.
> u.Z.
> v. C.
> v.C.
> vC.
> v. Chr.
> v. d. Z.
> v.d.Z.
> vor Christus
> vor der Zeitenwende
> vor unserer Zeitrechnung
> v. u. Z.
> v.u.Z.
>
> "n. Chr." and "v. Chr." are not recognized. Do you have the same result?
>
> Cheers,
> Armin
>
>
> -----Ursprüngliche Nachricht-----
> Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de] 
> Gesendet: Dienstag, 21. Mai 2013 19:58
> An: user@uima.apache.org
> Betreff: Re: Ruta - MARKFAST
>
> Hi,
>
> On 21.05.2013 15:49, armin.weg...@bka.bund.de wrote:
>> Hello!
>>
>> Is there any possibility to match strings like
>>
>> nC.
>> v. Chr.
>>
>> with MARKFAST?
> Yes. Did you observe any problems? I just tested it with:
>
> Wordlist:
> nC.
> v. Chr.
>
> Input document:
> nC.
> v. Chr.
> n C .
> v . Chr.
>
> Script:
> PACKAGE uima.ruta.tests;
> WORDLIST testList = 'test.txt';
> DECLARE Test;
> Document{->MARKFAST(Test, testList)};
>
> ... creates four annotations of type test.
>
> Best,
>
> Peter
>
>
>
>> Cheers,
>> Armin

Reply via email to