"Michael Kaplan (Trigeminal Inc.)" wrote:
> Actually, the XML spec is quite clear that neither UTF-16 nor UTF-8 require
> the encoding tag.... XML is defined by one of the following:
> 
> 1) Starts with byte Mark for Big-Endian/Little-Endian Unicode -- go with the
> byte mark
> 
> 2) No encoding information... UTF-8  can be assumed (often it is just ASCII
> so this works)
> 
> 3) Any other encoding, use the encoding tag as Marcus mentions

you can do without for utf-8 and utf-16, but you should have it anyway.

> Clearly, we are being told that this is not a requirement of an XML
> processor. Unfortunately, most of the ones out there do not understand the
> encoding tag, cannot read UTF-16 files, and destroy UTF-8 outside of the
> ASCII range.

the ibm xml parser that is open-source and also part of apache does read encodings as 
specified and deals with a number of other ones, too. you can have icu underneath and 
get more than 60 codepages.

markus

Reply via email to