DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22623>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22623

Tabulator (U+0009) character in element attribute not serialized as numerical entity 
by default xml serializer





------- Additional Comments From [EMAIL PROTECTED]  2003-09-27 00:06 -------

Looking into this further the XSLT 1.0 recommendation ( 
http://www.w3.org/TR/1999/REC-xslt-19991116#creating-attributes )
says this on creating attributes in a note:
>     When an xsl:attribute contains a text node with a newline, 
>     then the XML output must contain a character reference. 

It is curious that they don't mention tab or carriage-return.  Yet this is only 
a "note" so it may not be complete.

In the XSLT 2.0 serialization draft things are clearer. This is just the future 
not the XSLT 1.0 recommendation, but I think it is just a clarification of the 
XSLT 1.0 serialization.
http://www.w3.org/TR/xslt-xquery-serialization/#xml-output it says:
>    A consequence of this rule is that certain whitespace 
>    characters should be output as character references,
>    to ensure that they survive the round trip through serialization and 
>    parsing. Specifically, CR characters in text nodes 
>    should be written as &#xD; or an equivalent; 
>    while CR, NL, and TAB characters in attribute nodes should be output
>    respectively as &#xD;, &#xA;, and &#x9;, or their equivalents.

As the sentance in red above says "to ensure that they survive the round 
trip".  If we do not output the tab as a charcter reference then using the 
output document as input to another transform will loose the tab forever (it 
will turn into a space).

In my opinion the note in the XSLT 1.0 recommendation simply mentions the 
newline as an example, but could just as easily have mentioned a tab. Given the 
text in the XSLT 2.0 draft, which is not in a note, I think that the intention 
is clear.  Henry Zongaro has also pointed out to me a section of the XSLT 1.0 
recommendation that implies the same "round trip" behavior is needed for 1.0.

Xerces will turn the character reference into a character before Xalan-J sees 
it, so the only way to fix it is for Xalan-J to turn a real tab in an attribute 
back into a character reference. This happens to break attribset22 so changes 
are needed to this testcase.


===================================================================

The design of the patch was to use 2 different methods of CharInfo, one for
characters in an attribute, and one for characters in text.  The original design
doesn't make the distinction.

Henry Zongaro reviewed my original patch and caught my mental oversight.
Here were his comments:
----------------------------------------------------------
     My only comment on the patch is that the way the "fromTextNode" flag is 
used doesn't look correct.  There are conditions like the following:

if ((fromTextNode && m_charInfo.isSpecialTextChar(ch))
    || m_charInfo.isSpecialAttrChar(ch))

which means that if m_charInfo.isSpecialAttrChar(ch) is true, the entire 
condition is true, regardless of the setting of fromTextNode.  You probably 
intended the following.

if ((fromTextNode && m_charInfo.isSpecialTextChar(ch))
    || (!fromTextNode && m_charInfo.isSpecialAttrChar(ch))
------------------------------------------------------------
The slightly re-worked patch, to be attached very soon will include fixes for 
Henry's observation.

Reply via email to