On Mon, Mar 29, 2010 at 09:21:56PM -0400, Ethan Tira-Thompson wrote: > Thanks for all the information, I'll try to collate things :) > > Failure to do so would just make the parser non-conformant to the XML-1.0 > > specification. > > Are you sure about this? Like I said, I'm not aware of the > specification that it must be an error if more data follows the document. > The spec does defines this extra data is not part of the document, > but AFAIK not what you should do with/about it. It would better serve > interoperability to simply ignore it and let the user decide if it's an > issue, probably issuing a warning by default. But I'm no expert on the > spec, it would be educational if you could point me to the section.
You get things backward, read the spec: http://www.w3.org/TR/REC-xml/#sec-documents "Each XML document has both a logical and a physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document." An entity is basically a file. In your case there is only one entity as you are not loading any external entity. Now comes the definition of Well-Formed XML Documents http://www.w3.org/TR/REC-xml/#sec-well-formed " [Definition: A textual object is a well-formed XML document if:] 1. Taken as a whole, it matches the production labeled document. 2. It meets all the well-formedness constraints given in this specification. 3. Each of the parsed entities which is referenced directly or indirectly within the document is well-formed. [1] document ::= prolog element Misc* " so the definition is based on you give a textual object and the processor tells you whether it's well formed. In that case you feed the entity content, and the processor will parse it. If it find a second root element you get a fatal error and the *whole* is a not well formed document. You can make all the theories about how the processor could just ignore thinsg or stop at a given point, it's just not how the spec says an XML processor must be implemented. You will note the "taken as a whole" clearly indicating it's absolutely forbidden to stop applying the rules at some point. You feed the XML parser what the entiti(es) contains and it provides a result back. If there is an error in the middle or the end, it invalidates the whole document. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [email protected] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
