Consider a document such as the following:

<root>
 <child1/>
 <child2>
</root>

Clearly it is malformed because the </child2> end-tag is missing. However, a streaming parser using SAX will still report startDocument(), startElement(root), characters(), startElement(child1), endElement(child1), characters(), and startElement(child2) before the malformedness is detected and a SAXParseException is thrown.

Or will it? In my tests with Xerces-J 2.5 I'm getting only startDocument() before a SAXParseException is thrown. The XML spec does not require a parser to throw away content found before the first well-formedness error. However, Xerces seems to be throwing it away for me, and I can't find anything in the SAX spec to say this is wrong. Not having guaranteed access to the well-formed initail section of the document really decreases the usefulness of a streaming API.

For my app, I would like to guarantee that all content before the first well-formedness error is reported via the normal mechanisms. is this possible? Is this a good idea? Should SAX be rewritten to require this behavior? Or am I out to sea? Thoughts?


-- Elliotte Rusty Harold


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to