Hi Robert,
Xerces does the validation of an element after the element is closed, so that it has a list of child nodes to look at.
You will always get parsing errors as soon as they are encountered; only validation errors will be reported after the endElement notification.


Alberto

At 13.03 16/03/2004 +0100, Robert Zimmermann wrote:
Hello,

I am not sure if this can be treated as an bug so first as a question:

Some DTD failures are reported by Xerces too late, in my case it is the
wrong order of elements.
This behaviour breaks my SAX parsing code as my interface relies on Xerces
to report invalid XML in time, and as a consequence attempts to create a
pice of information in the wrong place (which causes an segfault on Linux).
I have tested the same DTD/XML with an Python SAX parser, which reports the
DTD error in time.

I think, one who implements XML parsing with an SAX interface should be able
to rely on DTD failure reportings on time.

Sample Code:
DTD:
---------------------------
<!ELEMENT books (book)*>
<!-- author first, then title followed by price -->
<!ELEMENT book (author, title, price)>
<!ATTLIST book
    category CDATA #REQUIRED
>
<!ELEMENT author (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT price (#PCDATA)>
---------------------------

XML (not valid one)
---------------------------
<!DOCTYPE books SYSTEM "books.dtd">
<books>
  <book category="reference">
    <title>Sayings of the Century</title>
    <author>Nigel Rees</author>
    <price>8.95</price>
  </book>
</books>
---------------------------

In this XML the <author> element is in the wrong position.
Xerces reports this error after <price>, more precisely when book is closed.

Also the row number of the wrong position is the one of the closing book
element. Not, as I would expect, the row of the author element.

Anyway the wrong row or column numbers in error reporting are not too bad
but the late exception in an SAX handler is fatal. Sure with the apropriate
knowledge the SAX handler could be implemented more robust, but first of all
every pice already declared in the grammar (DTD or WXS) has to be cared
about once again inside the SAX handler.

What do you guys think about this?

Error reported by SAXPrint sample of Xerces 2.5:
Error at file books_bad.xml, line 7, char 10
  Message: Element 'title' is not valid for content model:
'(author,title,price)'

Error reported by xmlproc of Python:
xml.sax._exceptions.SAXParseException: books_bad.xml:4:12: Element 'author'
missing before element 'title'


Thanks, Robert

WXS = W3C XML Schema

Robert Zimmermann
Softwaredevelopment
WEB.DE AG
http://ComWin.name/[EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to