Re: CR+LF = 1 character?

Jörn Kottmann Wed, 20 Apr 2011 02:07:00 -0700

On 4/20/11 10:58 AM, Jens Grivolla wrote:

Hi,
while working on the integration between UIMA and a different textannotation system we ran into problems with differing offsets betweenthe two systems.
As it turns out, the other system considers CR+LF (Windows style lineendings) to be two characters, while UIMA sees it as one.

The string sofa inside a CAS contains 16 bit unicode characters andCR+LF are two unicode characters. So I believe you are mistakenor there is somewhere a bug which turns CR+LF into one char. All offsetsare 16 bit unicode offsets, even so one character might needtwo 16 bit slots. So it might be possible to have an annotation over onecharacter which has a length of two.


Jörn

Re: CR+LF = 1 character?

Reply via email to