On 24 Apr 2005, at 09:37, Christoph Theis wrote:

The original spec allowed ASCII characters only for strings. The word "ASCII"
ws removed 2003. XmlWriter still checks for the range 0x20 ... 0xff
and not for the range allowed by the XML spec. There had been a lot of
discussion over this the last years and as far as I know Apaches
(our) xmlrpc still clings to ASCII characters. But might be I'm wrong ...



Apache XML-RPC uses the ISO 8859/1 encoding (it emits an XML declaration saying this). 8859/1 is an eight bit encoding so only Unicode code points up to 0XFF can be represented directly. Code points with values greater than this should be represented by character references (e.g. ǿ) I think that XmlWriter does this. I'm sorry but I do not have ready access to the source code from this machine so I can't check the details directly.


The use of ISO 8859/1 has always been a bit of a puzzle to me. XML parsers are only required to understand UTF-8 and UTF-16 so using ISO 8859/1 theoretically reduces interoperability. However, I do not recall ever hearing of such a problem in practice. My own view is that for maximum interoperability only code points up to 127 should be represented directly values > 127 should be represented by character references. The cost of doing this is that the number of octets used rises when non USASCII characters are exchanged.


John Wilson The Wilson Partnership http://www.wilson.co.uk



Reply via email to