Re: [PATCH] characters invalid for an encoding

Jochen Wiedmann Wed, 04 May 2005 17:36:10 -0700

Daniel Rall wrote:

> On 2002/08/19, CVS rev 1.3 of XmlWriter introduced code to entity encode
> characters in the range 0x20 to 0xff, characters which are invalid as
> un-encoded _XML_.  And so it was Good.


Sorry for asking, but I tend to become more and more confused. :-)

What does "invalid as un-encoded XML" mean? Not being within the
encodings character set?

If so, the range 0x20 to 0xff is quite arbitrarily and not even valid in
all cases. For example, it fails for "US-ASCII" encoding. In other
words, to me this wasn't good.


> With the restriction on ASCII-only <string> payloads removed, do we want
> to go back to the days of CVS rev 1.3, where all characters which are
> not valid _XML_ are entity encoded, and no special handling is enforced
> based on the XmlWriter's encoding?

Besides the fact, that I do not understand, what has actually been
restricted (lexical representation or actual character set) and that the
 latter would make XML-RPC pretty useless to most of us: The restriction
is away.

So we have, IMO, the following options:

- Choose UTF-8 as the encoding; that means, that only very few
  characters ('<', for example) has to be escaped.

- Choose US-ASCII as the encoding. In other words, escape everything
  beyond 0x7f.

- Invent a new interface and let the user decide, for example:

      public class XmlRpcEncoder {
          String getEncoding();
          boolean isEscaped(char pChar);
      }

I do personally favour the first option very clearly.


Jochen

Re: [PATCH] characters invalid for an encoding

Reply via email to