On Thu, 2005-05-05 at 02:35 +0200, Jochen Wiedmann wrote:
>Daniel Rall wrote:
...
>What does "invalid as un-encoded XML" mean? Not being within the
>encodings character set?
I was referring to characters which had not been entity-encoded using
references like < or �xffff;.
>If so, the range 0x20 to 0xff is quite arbitrarily and not even valid in
>all cases. For example, it fails for "US-ASCII" encoding. In other
>words, to me this wasn't good.
This change was only intended to catch characters invalid in XML, which
it did an incomplete job of.
...
>- Choose UTF-8 as the encoding; that means, that only very few
> characters ('<', for example) has to be escaped.
Ideally speaking, this option also strikes me as the cleanest. Sadly,
the reality is that there are a lot of old XML-RPC clients and servers
out there in production, and that we could only offer this behavior as a
non-default configuration toggle.
>- Choose US-ASCII as the encoding. In other words, escape everything
> beyond 0x7f.
John Wilson also made this suggestion. Given the very real inter-op
concerns we have to live with, I propose that this be the default
behavior.
>- Invent a new interface and let the user decide, for example:
>
> public class XmlRpcEncoder {
> String getEncoding();
> boolean isEscaped(char pChar);
> }
Not to over-engineer things, I also envisioned this type of solution to
implement the UTF-8 toggle discussed above.