Re: [PATCH] characters invalid for an encoding

Daniel Rall Thu, 05 May 2005 14:06:30 -0700

On Thu, 2005-05-05 at 02:35 +0200, Jochen Wiedmann wrote:
>Daniel Rall wrote:
...
>What does "invalid as un-encoded XML" mean? Not being within the
>encodings character set?


I was referring to characters which had not been entity-encoded using
references like &lt; or &#0xffff;.

>If so, the range 0x20 to 0xff is quite arbitrarily and not even valid in
>all cases. For example, it fails for "US-ASCII" encoding. In other
>words, to me this wasn't good.

This change was only intended to catch characters invalid in XML, which
it did an incomplete job of.

...
>- Choose UTF-8 as the encoding; that means, that only very few
>  characters ('<', for example) has to be escaped.

Ideally speaking, this option also strikes me as the cleanest.  Sadly,
the reality is that there are a lot of old XML-RPC clients and servers
out there in production, and that we could only offer this behavior as a
non-default configuration toggle.

>- Choose US-ASCII as the encoding. In other words, escape everything
>  beyond 0x7f.

John Wilson also made this suggestion.  Given the very real inter-op
concerns we have to live with, I propose that this be the default
behavior.

>- Invent a new interface and let the user decide, for example:
>
>      public class XmlRpcEncoder {
>          String getEncoding();
>          boolean isEscaped(char pChar);
>      }

Not to over-engineer things, I also envisioned this type of solution to
implement the UTF-8 toggle discussed above.

Re: [PATCH] characters invalid for an encoding

Reply via email to