DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22623>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22623

Tabulator (U+0009) character in element attribute not serialized as numerical entity 
by default xml serializer





------- Additional Comments From [EMAIL PROTECTED]  2003-09-04 17:23 -------
The XML 1.0 recommendation says this is section 3.3.3 
(http://www.w3.org/TR/REC-xml#AVNormalize) on attribute normalization says to 
apply the first one of these rules that applies:

For each character, entity reference, or character reference in the 
unnormalized attribute value, beginning with the first and continuing to the 
last, do the following:
1) For a character reference, append the referenced character to the normalized 
value.
2) For an entity reference, recursively apply step 3 of this algorithm to the 
replacement text of the entity.
3) For a white space character (#x20, #xD, #xA, #x9), append a space character 
(#x20) to the normalized value.
4) For another character, append the character to the normalized value.

-------------------------------------
So if the serializer does see #xD #xA or #x9 in an attribute one reason can be 
because they are coming from entity references, and as such should be output as 
entity references (if they came in as characters then they would have been 
normalized to spaces before the serializer saw them).

When testing the patch that I am about to attach to this bug one testcase 
failed. attribset22.  It failed because a tab character (not a character 
reference) was turned into a character reference.  The XSL in that testcase 
looked like this:
<Out><xsl:attribute name="a">x
          y</xsl:attribute></Out>

Between the 'x' and 'y' in the text node above were a newline character and a 
tab character and two spaces(my editor is not being friendly so I just put 8 
spaces in this append, but it was a tab).

The expected output in the master file was
<Out a="x&#10;          y" />

With the patch what came out was:
<Out a="x&#10;&#9;  y" />

Looking into this further the XSLT 1.0 recommendation ( 
http://www.w3.org/TR/1999/REC-xslt-19991116#creating-attributes )
says this on creating attributes in a note:
     When an xsl:attribute contains a text node with a newline, 
     then the XML output must contain a character reference. 

It is curious that they don't mention tab or carriage-return.  Yet this is only 
a "note".

If the patch that I am about to attach is applied to Apache then the gold file
for attribset22 will need to change.  The tab character in it will need to 
change to a character reference, &#9;

- Brian Minchau

Reply via email to