I don't know what you are trying to say. Perhaps you could explain it at the meeting next week.
Mark __________________________________ http://www.macchiato.com ► “Eppur si muove” ◄ ----- Original Message ----- From: "Michael (michka) Kaplan" <[EMAIL PROTECTED]> To: "Mark Davis" <[EMAIL PROTECTED]>; "Murray Sargent" <[EMAIL PROTECTED]>; "Joseph Boyle" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Saturday, November 02, 2002 04:18 Subject: Re: Names for UTF-8 with and without BOM > From: "Mark Davis" <[EMAIL PROTECTED]> > > > That is not sufficient. The first three bytes could represent a real > content > > character, ZWNBSP or they could be a BOM. The label doesn't tell you. > > There are several problems with this supposition -- most notably the fact > that there are cases that specifically claim this is not recommended and > that U+2060 is prefered? > > > This is similar to UTF-16 CES vs UTF-16BE CES. In the first case, 0xFE > 0xFF > > represents a BOM, and is not part of the content. In the second case, it > > does *not* represent a BOM -- it represents a ZWNBSP, and must not be > > stripped. The difference here is that the encoding name tells you exactly > > what the situation is. > > I do not see this as a realistic scenario. I would argue that if the BOM > matches the encoding scheme, perhaps this was an intentional effort to make > sure that applications which may not understand the higher level protocol > can also see what the encoding scheme is. > > But even if we assume that someone has gone to the trouble of calling > something UTF16BE and has 0xFE 0xFF at the beginning of the file. What kind > of content *is* such a code point that this is even worth calling out as a > special case? > > If the goal is to clear and unambiguous text then the best way would to > simplify ALL of this. It was previously decided to always call it a BOM, why > not stick with that? > > MichKa > > >