On Thu, Feb 03, 2005 at 07:39:27AM -0700, [EMAIL PROTECTED] wrote: > Recently, we found out that some of our XML text nodes contain the 0x1A > character. This causes the Xerces parser to throw a Invalid character > (Unicode: 0x1A) error. > > Upon investigating the XML specs, the XML 1.0 Spec does not show that in > the list of valid characters. However, the XML 1.1 spec points to it as > a valid, but restricted (not sure what that means).
See the XML 1.1 Recommendation: http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-xml11 " 1.3 Rationale and list of changes for XML 1.1 ... Therefore, XML 1.1 allows the use of character references to the control characters #x1 through #x1F, most of which are forbidden in XML 1.0. For reasons of robustness, however, these characters still cannot be used directly in documents. In order to improve the robustness of character encoding detection, the additional control characters #x7F through #x9F, which were freely allowed in XML 1.0 documents, now must also appear only as character references. ... " IOTW, the C0 control characters are illegal in XML 1.0, but may appear in XML 1.1 *if encoded as a chararacter reference*. So, your particular Unicode code point 0x1A is not allowed to be present if you mark the documents as XML 1.0, and may only be present if encoded as  if you mark the documents as XML 1.1. Michael --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]