Re: MS/Unix BOM FAQ again (small fix)

Kenneth Whistler Tue, 09 Apr 2002 20:56:49 -0700

> I agree, there are different ways to look at it. But the statement
> 
> > > > A Unicode text file beginning with FEFF is
> > > > big-endian, and a file beginning with FFFE (not a legal Unicode
> > > > character for any other purpose) is little-endian
> 
> is just plain wrong, since UTF-32, for example, could start with bytes
> FE FF.


Um, not legally in open interchange.

Either you have big-endian UTF-32 <FE FF nn mm ..> which would correspond
to U-FEFFnnmm ... -- and that is out-of-range for both Unicode and 10646.

Or you have little-endian UTF-32 <FE FF nn 00 ..> which would correspond
to U-00nnFFFE ..., where nn could be 00..10, but all such values are
noncharacters, and cannot be used in open interchange.

So if serialized "Unicode text" starts off <FE FF ...> and purports to be legal,
it cannot be UTF-32, it cannot be UTF-8, and it cannot be little-endian.

--Ken

Re: MS/Unix BOM FAQ again (small fix)

Reply via email to