Re: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

Richard Wordingham Thu, 05 Jun 2014 10:43:08 -0700

On Thu, 5 Jun 2014 09:41:07 +0200
Philippe Verdy <verd...@wanadoo.fr> wrote:


> You'll probably want to sync on the first newline control and then
> proceed from that point. But now if you have those devices configured
> heterogenously and generating their own output encoding you won't
> necessarily know how it is encoded even uf all of them use some UTF of
> Unicode. So the stream will regularly repost an encoding mark, for
> exampel at the begining of each dated logged entry, and this could be
> just an encoded BOM (even with UTF-8, or some other UTF like UTF-16
> which would be more likely if the language contained essentially an
> East-Asian (CJK) language.

Of course, this is not an arbitrary fragment.  In this location, ZWNBSP
will have almost no effect.  (The only mechanisms I can think of are
character counts and the text being pasted immediately after another
word.)  This, and the early belief that U+FFFE would not occur in
Unicode text, are why it was chosen.

Richard.
_______________________________________________
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

Re: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

Reply via email to