On Mon, Feb 25, 2008 at 09:45:13PM +0100, Petr Pajas wrote: > Hi Daniel, All, > > the following inconsistency in DTD validation, reproducible with xmllint, was > reported to me by a user of XSH2, Jakub Neburka. > > He takes two files: decl.dtd and decl.xml and does basically the following: > > 1) xmllint --valid decl.xml > xmllint --postvalid decl.xml > > both succeed. > > 2) xmllint --shell decl.xml > /> validate > > this, however, fails with > > decl.xml:5: element root: validity error : Element root was declared EMPTY > this one has content > > (Probably because the library calls are alike, XSH2 behaves similarly: > parse-time validation is fine, validating the in-memory tree fails). > > The test cases follow. > > __decl.dtd__ > <!ENTITY % cond "IGNORE"> > <![%cond;[ > <!ENTITY % content "ANY"> > ]]> > <!ENTITY % content "EMPTY"> > <!ELEMENT root %content;> > __CUT__ > > __decl.xml__ > <?xml version="1.0" encoding="UTF-8"?> > <!DOCTYPE root SYSTEM "decl.dtd" [ > <!ENTITY % cond "INCLUDE"> > ]> > <root>content</root> > __CUT__ > > Can you confirm this is a bug? Shall I bugzilla it?
Not a bug. When you do things like Post validation, you give it a preparsed DTD. in that case the DTD was parsed without the context of the document, while the internal subset changes the behaviour. Basically xmlValidateDtd() or any validation using a DTD parsed out of the context of the document can't exactly match the behaviour of XML-1.0 validation, because it allows the document to modify the DTD. Actually having a validation which depends only on the DTD/schemas and where the document can't modify the set of rules set by the receiver is in a lot of cases a good thing, if you consider a DTD/Schemas is a contract between a producer and a consumer of documents. If you want to have 100% the DTD validation semantic as described in XML-1.0 spec, reparsing the document is I think the only guaranteed correct option. Also note that the mismatch is documented in libxml2 call /** * xmlValidateDtd: * @ctxt: the validation context * @doc: a document instance * @dtd: a dtd instance * * Try to validate the document against the dtd instance * * Basically it does check all the definitions in the DtD. * Note the the internal subset (if present) is de-coupled * (i.e. not used), which could give problems if ID or IDREF * is present. * * returns 1 if valid or 0 otherwise */ Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml