"Michael Kaplan (Trigeminal Inc.)" wrote:
> Actually, the XML spec is quite clear that neither UTF-16 nor UTF-8 require
> the encoding tag.... XML is defined by one of the following:
>
> 1) Starts with byte Mark for Big-Endian/Little-Endian Unicode -- go with the
> byte mark
>
> 2) No encoding information... UTF-8 can be assumed (often it is just ASCII
> so this works)
>
> 3) Any other encoding, use the encoding tag as Marcus mentions
you can do without for utf-8 and utf-16, but you should have it anyway.
> Clearly, we are being told that this is not a requirement of an XML
> processor. Unfortunately, most of the ones out there do not understand the
> encoding tag, cannot read UTF-16 files, and destroy UTF-8 outside of the
> ASCII range.
the ibm xml parser that is open-source and also part of apache does read encodings as
specified and deals with a number of other ones, too. you can have icu underneath and
get more than 60 codepages.
markus