DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22623>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22623 Tabulator (U+0009) character in element attribute not serialized as numerical entity by default xml serializer ------- Additional Comments From [EMAIL PROTECTED] 2003-09-04 17:23 ------- The XML 1.0 recommendation says this is section 3.3.3 (http://www.w3.org/TR/REC-xml#AVNormalize) on attribute normalization says to apply the first one of these rules that applies: For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following: 1) For a character reference, append the referenced character to the normalized value. 2) For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity. 3) For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value. 4) For another character, append the character to the normalized value. ------------------------------------- So if the serializer does see #xD #xA or #x9 in an attribute one reason can be because they are coming from entity references, and as such should be output as entity references (if they came in as characters then they would have been normalized to spaces before the serializer saw them). When testing the patch that I am about to attach to this bug one testcase failed. attribset22. It failed because a tab character (not a character reference) was turned into a character reference. The XSL in that testcase looked like this: <Out><xsl:attribute name="a">x y</xsl:attribute></Out> Between the 'x' and 'y' in the text node above were a newline character and a tab character and two spaces(my editor is not being friendly so I just put 8 spaces in this append, but it was a tab). The expected output in the master file was <Out a="x y" /> With the patch what came out was: <Out a="x 	 y" /> Looking into this further the XSLT 1.0 recommendation ( http://www.w3.org/TR/1999/REC-xslt-19991116#creating-attributes ) says this on creating attributes in a note: When an xsl:attribute contains a text node with a newline, then the XML output must contain a character reference. It is curious that they don't mention tab or carriage-return. Yet this is only a "note". If the patch that I am about to attach is applied to Apache then the gold file for attribset22 will need to change. The tab character in it will need to change to a character reference, 	 - Brian Minchau
