On Mon, Feb 25, 2008 at 09:45:13PM +0100, Petr Pajas wrote:
> Hi Daniel, All, 
> 
> the following inconsistency in DTD validation, reproducible with xmllint, was 
> reported to me by a user of XSH2, Jakub Neburka.
> 
> He takes two files: decl.dtd and decl.xml and does basically the following:
> 
> 1) xmllint --valid decl.xml
>    xmllint --postvalid decl.xml
> 
> both succeed.
> 
> 2) xmllint --shell decl.xml
> /> validate
> 
> this, however, fails with
> 
> decl.xml:5: element root: validity error : Element root was declared EMPTY 
> this one has content
> 
> (Probably because the library calls are alike, XSH2 behaves similarly: 
> parse-time validation is fine, validating the in-memory tree fails).
> 
> The test cases follow.
> 
> __decl.dtd__
> <!ENTITY % cond "IGNORE">
> <![%cond;[
> <!ENTITY % content "ANY">
> ]]>
> <!ENTITY % content "EMPTY">
> <!ELEMENT root %content;>
> __CUT__
> 
> __decl.xml__
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE root SYSTEM "decl.dtd" [
> <!ENTITY % cond "INCLUDE">
> ]>
> <root>content</root>
> __CUT__
> 
> Can you confirm this is a bug? Shall I bugzilla it?

  Not a bug. When you do things like Post validation, you give it a
preparsed DTD. in that case the DTD was parsed without the context
of the document, while the internal subset changes the behaviour.
Basically xmlValidateDtd() or any validation using a DTD parsed out
of the context of the document can't exactly match the behaviour of
XML-1.0 validation, because it allows the document to modify the
DTD.
  Actually having a validation which depends only on the DTD/schemas
and where the document can't modify the set of rules set by the receiver
is in a lot of cases a good thing, if you consider a DTD/Schemas is
a contract between a producer and a consumer of documents.
  If you want to have 100% the DTD validation semantic as described in
XML-1.0 spec, reparsing the document is I think the only guaranteed
correct option.
  Also note that the mismatch is documented in libxml2 call
/**
 * xmlValidateDtd:
 * @ctxt:  the validation context
 * @doc:  a document instance
 * @dtd:  a dtd instance
 *
 * Try to validate the document against the dtd instance
 *
 * Basically it does check all the definitions in the DtD.
 * Note the the internal subset (if present) is de-coupled
 * (i.e. not used), which could give problems if ID or IDREF
 * is present.
 *
 * returns 1 if valid or 0 otherwise
 */

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to