Dear Jim
I think I understand what you mean. Playing with external analyzers
which add/remove tags to/from the text with regular expressions may
lead to this situation. The problem cannot exist with stand-off
annotations.
Assuming that only the number of whitespace characters have changed, you
ma
I apologize if my terminology doesn't match with normal UIMA usage - but
hopefully the general idea will be understandable.
Is it always assumed that UIMA's document text is immutable? Let's say you have
some text and with several position-based annotations. The text changes, now
all of your an
Hi all,
In case some of you are interested, I've implemented a UIMA component
to do word tokenization. This component handles tokenization of French
texts in a better way than what the WhitespaceTokenizer does.
The detail of the implementation is described on my blog [1] (in
French only, sorry),