DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=16295>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=16295

DOMPrint Entity References

[EMAIL PROTECTED] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID



------- Additional Comments From [EMAIL PROTECTED]  2003-01-23 20:23 -------
First of all, please upgrade to the latset parser if possible.   There are many 
many fixes since Xerces-C++ 1.4.

Using the latest parser, Xerces-C++ 2.1, in fact DOMPrint will print this
    <?xml version="1.0" encoding="UTF-8" standalone="no" ?>
    <Test>&lt;&gt;&amp;'"</Test>

where <, > and & are represented as entity reference; while ' and " are printed 
as is.   The DOMPrint is behaving as designed.

According to XML 1.0 spec, 4.4 XML Processor Treatment of Entities and 
References, the parser need to expand the entity reference in the xml 
document.   

Thus the string generated by the parser in fact is something like:
  <Test><>&'"</Test>

Then when DOMPrint writes the string out, since the DOMPrint is supposed to 
generate something that is parsable if sent back to the parser, it cannot print 
such string as is. 
   
Thus the DOMPrint is doing some "touch up", JUST ENOUGH, to get the string 
parsable.

>From DOMPrint perspective, it does NOT know what the original string was, it 
may be 
   <Test>&lt;&gt;&amp;&apos;&quot;</Test>
or
   <Test>&lt;&gt;&amp;'"</Test>

It does NOT know.   All it sees is
  <Test><>&'"</Test>

And since the appearance of >, & and < in text node value are not allowed as 
per XML 1.0 spec, and will lead to not well-form XML error, so DOMPrint fixes 
them to &gt; , &amp; and &lt; respectively; 
while the ' and " are ok to the parser, so DOMPrint does not do anything to 
them.  DOMPrint touches up the string just enough to get it parsable.

Thus you get
   <Test>&lt;&gt;&amp;'"</Test>
from DOMPrint.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to