It is a pretty good assumption; but if BOMs are used on smaller fields
the probability goes up. And to be perfectly reliable, you can't
assume it.

That is one reason that the WORD JOINER was encoded, so that
eventually we can use FEFF solely as a BOM.

Mark
—————

Γνῶθι σαυτόν — Θαλῆς
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

----- Original Message -----
From: "Doug Ewell" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "Mark Davis" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Wednesday, April 10, 2002 22:35
Subject: Re: MS/Unix BOM FAQ again (small fix)


> Mark Davis <[EMAIL PROTECTED]> wrote:
>
> > - when one of the BOM-allowing UTFs starts with a BOM, you know
the
> > encoding*, and you strip off the BOM when you get the content.
> >
> > *assuming that no UTF-16 file has U+0000 as the first character.
>
> In the real world, this is a pretty good assumption -- almost as
good,
> in fact, as the one I've been stating for years:  that no Unicode
file
> will have a zero-width no-break space (intended as such) as the
first
> character.
>
> -Doug Ewell
>  Fullerton, California
>
>
>
>


Reply via email to