On Thu, Feb 03, 2005 at 07:39:27AM -0700, [EMAIL PROTECTED] wrote:
> Recently, we found out that some of our XML text nodes contain the 0x1A
> character.  This causes the Xerces parser to throw a Invalid character
> (Unicode: 0x1A) error. 
> 
> Upon investigating the XML specs, the XML 1.0 Spec does not show that in
> the list of valid characters.  However, the XML 1.1 spec points to it as
> a valid, but restricted (not sure what that means).  

See the XML 1.1 Recommendation:
  http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-xml11
    "
    1.3 Rationale and list of changes for XML 1.1

    ...
    Therefore, XML 1.1 allows the use of character references to the control
    characters #x1 through #x1F, most of which are forbidden in XML 1.0. For
    reasons of robustness, however, these characters still cannot be used
    directly in documents. In order to improve the robustness of character
    encoding detection, the additional control characters #x7F through #x9F,
    which were freely allowed in XML 1.0 documents, now must also appear only
    as character references.
    ...
    "

IOTW, the C0 control characters are illegal in XML 1.0,
but may appear in XML 1.1 *if encoded as a chararacter reference*.

So, your particular Unicode code point 0x1A is not allowed to be present
if you mark the documents as XML 1.0, and may only be present if encoded
as  if you mark the documents as XML 1.1.

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to