I've noticed that when I parse an XML document through JAXP with Xerces
into a DOM document object, the resulting Document object includes all
comments that occur in the external DTD subset as children of the
document node. As near as I can tell, this is not correct. Crimson does
not exhibit this problem.
Can anyone confirm or deny that this is a bug, and whether my
understanding of the problem is correct? The basic issue is this.
Consider this XML document:
Here's an XML document to parse with this:
<!DOCTYPE test SYSTEM "test.dtd">
<!-- Comment in document -->
<test>
Hello
</test>
And here's the DTD:
<!-- comment in DTD -->
<!ELEMENT test (#PCDATA)>
If we parse the XML document into a DOM Document object, how many
comment children should that Document object have? 1 or 2? In fact, in
Xerces 2.0.0 and 2.0.1 it has two, including the one from the external
DTD subset.
Here's a simple program to demonstrate the problem:
import javax.xml.parsers.*;
import org.w3c.dom.*;
public class Test {
public static void main(String[] args) {
System.setProperty("javax.xml.parsers.DocumentBuilderFactory",
"org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");
String input = "test.xml";
if (args.length != 0) {
input = args[0];
}
try {
DocumentBuilderFactory factory
= DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Node doc = builder.parse(input);
NodeList kids = doc.getChildNodes();
for (int i = 0; i < kids.getLength(); i++) {
Node node = kids.item(i);
System.out.println(node.getNodeType() + ": "
+ node.getNodeValue());
}
}
catch (Exception e) {
System.err.println(e);
e.printStackTrace();
}
} // end main
} // end test
And finally here's the incorrect output using Xerces-J 2.0.1:
D:\xml\bug>java Test
10: null
8: comment in DTD
8: Comment in document
1: null
Notice that the comment from the DTD is a child of the root Document
element. That's the problem. I have submitted this in Bugzilla, but I'm
not 100% sure it really is a bug. Confirmation or denial would be
appreciated.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible, 2nd Edition (IDG Books, 2001) |
| http://www.cafeconleche.org/books/bible2/ |
| http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News: http://www.cafeconleche.org/ |
+----------------------------------+---------------------------------+
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]