DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22623>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22623 Tabulator (U+0009) character in element attribute not serialized as numerical entity by default xml serializer ------- Additional Comments From [EMAIL PROTECTED] 2003-09-27 00:06 ------- Looking into this further the XSLT 1.0 recommendation ( http://www.w3.org/TR/1999/REC-xslt-19991116#creating-attributes ) says this on creating attributes in a note: > When an xsl:attribute contains a text node with a newline, > then the XML output must contain a character reference. It is curious that they don't mention tab or carriage-return. Yet this is only a "note" so it may not be complete. In the XSLT 2.0 serialization draft things are clearer. This is just the future not the XSLT 1.0 recommendation, but I think it is just a clarification of the XSLT 1.0 serialization. http://www.w3.org/TR/xslt-xquery-serialization/#xml-output it says: > A consequence of this rule is that certain whitespace > characters should be output as character references, > to ensure that they survive the round trip through serialization and > parsing. Specifically, CR characters in text nodes > should be written as 
 or an equivalent; > while CR, NL, and TAB characters in attribute nodes should be output > respectively as 
, 
, and 	, or their equivalents. As the sentance in red above says "to ensure that they survive the round trip". If we do not output the tab as a charcter reference then using the output document as input to another transform will loose the tab forever (it will turn into a space). In my opinion the note in the XSLT 1.0 recommendation simply mentions the newline as an example, but could just as easily have mentioned a tab. Given the text in the XSLT 2.0 draft, which is not in a note, I think that the intention is clear. Henry Zongaro has also pointed out to me a section of the XSLT 1.0 recommendation that implies the same "round trip" behavior is needed for 1.0. Xerces will turn the character reference into a character before Xalan-J sees it, so the only way to fix it is for Xalan-J to turn a real tab in an attribute back into a character reference. This happens to break attribset22 so changes are needed to this testcase. =================================================================== The design of the patch was to use 2 different methods of CharInfo, one for characters in an attribute, and one for characters in text. The original design doesn't make the distinction. Henry Zongaro reviewed my original patch and caught my mental oversight. Here were his comments: ---------------------------------------------------------- My only comment on the patch is that the way the "fromTextNode" flag is used doesn't look correct. There are conditions like the following: if ((fromTextNode && m_charInfo.isSpecialTextChar(ch)) || m_charInfo.isSpecialAttrChar(ch)) which means that if m_charInfo.isSpecialAttrChar(ch) is true, the entire condition is true, regardless of the setting of fromTextNode. You probably intended the following. if ((fromTextNode && m_charInfo.isSpecialTextChar(ch)) || (!fromTextNode && m_charInfo.isSpecialAttrChar(ch)) ------------------------------------------------------------ The slightly re-worked patch, to be attached very soon will include fixes for Henry's observation.
