Xerces does the validation of an element after the element is closed, so that it has a list of child nodes to look at.
You will always get parsing errors as soon as they are encountered; only validation errors will be reported after the endElement notification.
Alberto
At 13.03 16/03/2004 +0100, Robert Zimmermann wrote:
Hello,
I am not sure if this can be treated as an bug so first as a question:
Some DTD failures are reported by Xerces too late, in my case it is the wrong order of elements. This behaviour breaks my SAX parsing code as my interface relies on Xerces to report invalid XML in time, and as a consequence attempts to create a pice of information in the wrong place (which causes an segfault on Linux). I have tested the same DTD/XML with an Python SAX parser, which reports the DTD error in time.
I think, one who implements XML parsing with an SAX interface should be able to rely on DTD failure reportings on time.
Sample Code: DTD: --------------------------- <!ELEMENT books (book)*> <!-- author first, then title followed by price --> <!ELEMENT book (author, title, price)> <!ATTLIST book category CDATA #REQUIRED > <!ELEMENT author (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT price (#PCDATA)> ---------------------------
XML (not valid one) --------------------------- <!DOCTYPE books SYSTEM "books.dtd"> <books> <book category="reference"> <title>Sayings of the Century</title> <author>Nigel Rees</author> <price>8.95</price> </book> </books> ---------------------------
In this XML the <author> element is in the wrong position. Xerces reports this error after <price>, more precisely when book is closed.
Also the row number of the wrong position is the one of the closing book element. Not, as I would expect, the row of the author element.
Anyway the wrong row or column numbers in error reporting are not too bad but the late exception in an SAX handler is fatal. Sure with the apropriate knowledge the SAX handler could be implemented more robust, but first of all every pice already declared in the grammar (DTD or WXS) has to be cared about once again inside the SAX handler.
What do you guys think about this?
Error reported by SAXPrint sample of Xerces 2.5: Error at file books_bad.xml, line 7, char 10 Message: Element 'title' is not valid for content model: '(author,title,price)'
Error reported by xmlproc of Python: xml.sax._exceptions.SAXParseException: books_bad.xml:4:12: Element 'author' missing before element 'title'
Thanks, Robert
WXS = W3C XML Schema
Robert Zimmermann Softwaredevelopment WEB.DE AG http://ComWin.name/[EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
