Re: Mutable text and annotations...

2010-09-08 Thread Nicolas Hernandez
Dear Jim I think I understand what you mean. Playing with external analyzers which add/remove tags to/from the text with regular expressions may lead to this situation. The problem cannot exist with stand-off annotations. Assuming that only the number of whitespace characters have changed, you ma

Mutable text and annotations...

2010-09-08 Thread Jim Hargrave
I apologize if my terminology doesn't match with normal UIMA usage - but hopefully the general idea will be understandable. Is it always assumed that UIMA's document text is immutable? Let's say you have some text and with several position-based annotations. The text changes, now all of your an

Word tokenizer for French

2010-09-08 Thread Fabien POULARD
Hi all, In case some of you are interested, I've implemented a UIMA component to do word tokenization. This component handles tokenization of French texts in a better way than what the WhitespaceTokenizer does. The detail of the implementation is described on my blog [1] (in French only, sorry),