I don't know what you are trying to say. Perhaps you could explain it at the
meeting next week.

Mark
__________________________________
http://www.macchiato.com
►  “Eppur si muove” ◄

----- Original Message -----
From: "Michael (michka) Kaplan" <[EMAIL PROTECTED]>
To: "Mark Davis" <[EMAIL PROTECTED]>; "Murray Sargent"
<[EMAIL PROTECTED]>; "Joseph Boyle" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Saturday, November 02, 2002 04:18
Subject: Re: Names for UTF-8 with and without BOM


> From: "Mark Davis" <[EMAIL PROTECTED]>
>
> > That is not sufficient. The first three bytes could represent a real
> content
> > character, ZWNBSP or they could be a BOM. The label doesn't tell you.
>
> There are several problems with this supposition -- most notably the fact
> that there are cases that specifically claim this is not recommended and
> that U+2060 is prefered?
>
> > This is similar to UTF-16 CES vs UTF-16BE CES. In the first case, 0xFE
> 0xFF
> > represents a BOM, and is not part of the content. In the second case, it
> > does *not* represent a BOM -- it represents a ZWNBSP, and must not be
> > stripped. The difference here is that the encoding name tells you
exactly
> > what the situation is.
>
> I do not see this as a realistic scenario.  I would argue that if the BOM
> matches the encoding scheme, perhaps this was an intentional effort to
make
> sure that applications which may not understand the higher level protocol
> can also see what the encoding scheme is.
>
> But even if we assume that someone has gone to the trouble of calling
> something UTF16BE and has 0xFE 0xFF at the beginning of the file. What
kind
> of content *is* such a code point that this is even worth calling out as a
> special case?
>
> If the goal is to clear and unambiguous text then the best way would to
> simplify ALL of this. It was previously decided to always call it a BOM,
why
> not stick with that?
>
> MichKa
>
>
>


Reply via email to