From: "Mark Davis" <[EMAIL PROTECTED]> > That is not sufficient. The first three bytes could represent a real content > character, ZWNBSP or they could be a BOM. The label doesn't tell you.
There are several problems with this supposition -- most notably the fact that there are cases that specifically claim this is not recommended and that U+2060 is prefered? > This is similar to UTF-16 CES vs UTF-16BE CES. In the first case, 0xFE 0xFF > represents a BOM, and is not part of the content. In the second case, it > does *not* represent a BOM -- it represents a ZWNBSP, and must not be > stripped. The difference here is that the encoding name tells you exactly > what the situation is. I do not see this as a realistic scenario. I would argue that if the BOM matches the encoding scheme, perhaps this was an intentional effort to make sure that applications which may not understand the higher level protocol can also see what the encoding scheme is. But even if we assume that someone has gone to the trouble of calling something UTF16BE and has 0xFE 0xFF at the beginning of the file. What kind of content *is* such a code point that this is even worth calling out as a special case? If the goal is to clear and unambiguous text then the best way would to simplify ALL of this. It was previously decided to always call it a BOM, why not stick with that? MichKa