DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22623>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22623 Tabulator (U+0009) character in element attribute not serialized as numerical entity by default xml serializer Summary: Tabulator (U+0009) character in element attribute not serialized as numerical entity by default xml serializer Product: XalanJ2 Version: 2.5Dx Platform: All URL: http://groups.google.de/groups?q=tabulator+attribute+xsl t&hl=de&lr=&ie=UTF- 8&selm=1g000ah.cqv8s26o5x8gN%25roth%40visualclick.de&rnu m=1 OS/Version: MacOS X Status: NEW Severity: Major Priority: Other Component: org.apache.xalan.serialize AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] [applies to: XalanJ2 2.5D1] SUMMARY: The XML default serializer needs to write tabulator (U+0009), CR and LF characters as numerical entities on serialization times in element attribute values, as otherwise due to attribute normalization rules outlined in the XML 1.0 spec, parsing the document by a conforming XML 1.0 parser, the document semantically changes (i.e. the tab is replaced by a single space). REPRODUCTION INFO & DETAILS: When using this "identity" processing sheet: ---snip-- <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" encoding="iso-8859-1" /> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet> --snip-- on this XML instance document: --snip-- <?xml version="1.0" encoding="iso-8859-1" ?> <element attr="a	tab" /> --snip-- the result is: --snip-- <?xml version="1.0" encoding="iso-8859-1"?> <element attr="a tab"/> --snip-- ^^ Tabulator(0x9)--^^ , i.e. the 	 numerical entity from the input document is not recreated at serialization time, but simply substituted for the real character, a tab. Unfortunately, this means that re-applying the identity stylesheet from above on this document makes the tab character get replaced by a single space character according to the Attribute-Value Normalization rules (<http://www.w3.org/TR/REC-xml#AVNormalize>): --snip-- <?xml version="1.0" encoding="iso-8859-1"?> <element attr="a tab"/> --snip-- ^ Space(0x20)-----^ In short: The above "identity" processing sheet does not deliver a semantically identical document. Because if it did, the tab character in the attribute value needed to be written as a numerical entity, so that a later parser would recreate the tab character in the attribute value (and normalize it away to a single space). Christian Roth
