On 6/4/2014 12:21 PM, Richard Wordingham wrote:
On Wed, 04 Jun 2014 11:40:11 -0700
Asmus Freytag <asm...@ix.netcom.com> wrote:

On 6/4/2014 11:26 AM, Doug Ewell wrote:
I meant U+FEFF as a zero-width no-break space. Obviously it is very
common to see U+FEFF as a signature or BOM.
The semantics of it were chosen at the time to make no sense
at the start, and to make the character invisible in most situations.
The remnant of its semantic was later taken up by Word Joiner, so that
there is now NO use for this as part of text.
The use as part of a convention has always been clear. If you stick
this at the front, readers will byte-reverse your data; that should
weed out accidental use pretty quickly :) Or prevent people from
getting "cute" with it in other ways.
Wrong!  If you stick U+FEFF at the start of a file, expect it to be
stripped.  If you stick U+FFFE at the start of a file, then expect to
see the rest of the text to be byte-reversed.
Duh. (reminder, have coffee first)

A./

So, I would think that for this particular code point, you can safely
assume that it's buggy or test data.
The example that's usually given is that of a text file sliced into
segments to avoid file size limits.  In these cases, there is the risk
that U+FEFF as ZWNBSP will wind up at the start of a segment and be
stripped.  The solution using the Windows command window is to perform a
*binary* concatenation of the segments; if one doesn't, newlines will
be inserted between the segments, which is much severer damage.

Richard.


_______________________________________________
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

Reply via email to