John Cowan wrote:
> 
> Tex Texin scripsit:
> 
> > Interestingly, although I didn't study it in detail, looking at rfc 2376
> > for prioritization over charset conflicts, it seems to recommend
> > stripping the BOM when converting from utf-16 to other charsets (and
> > without considering that ucs-4 would like to keep it). (section 5).
> 
> The point is not to try to convert it into an FFEF character or some
> replacement thereof, like say "?".

That may be the intent, but it doesn't say that. It should say convert
BOM to the equivalent BOM for the target encoding, if there is one.
Instead it says to strip it for other encodings.
(I wish it was called a signature rather than a BOM for most of these
usages.)

 
> > Also, in considering charset conflicts, 2376 fails to consider conflicts
> > between signature and the encoding declaration. (I have a utf-16BE BOM
> > and the encoding declaration is for utf-8...).
> 
> The encoding declaration is supposed to trump all.  So it is UTF-8, and
> since 0xFF is illegal in UTF-8, you blow chunks...

OK, but where is that written?

 
> > I'll have to check for a more up-to-date rfc.
> 
> There is none.

OK. Sorry if I seem to be difficult. I am just rereading a few things
with my new understanding to put the picture back together again.

tex
> 
> --
> John Cowan <[EMAIL PROTECTED]>     http://www.reutershealth.com
> I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
> han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@;XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft                            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------

Reply via email to