Hi again,

any response on this? I was pointed to a couple of bugs on Red Hat
bugzilla through IRC suggesting that reimplementation of this feature
would require big changes to the code so I understand if nobody wants to
do it. I'm considering fixing this myself, but don't really want to
spend that much time on this either. However I would at least appreciate
a negative answer.

Best regards,
Ondrej Lichtner

On Mon, Nov 18, 2013 at 12:59:46PM +0100, Ondrej Lichtner wrote:
> Hi everyone,
> 
> in our project we've recently started using a RelaxNG schema to validate
> our XML documents through the lxml python bindings of libxml2. However
> sometimes the errors reported for invalid documents are very unhelpful
> and even we as developers get confused and have to spend a few minutes
> looking for what's actually wrong. To demonstrate I simplified our
> schema and an invalid xml document with a simple python script that I've
> appended to this email. The script is not needed, running
> xmllint --relaxng schema.rng test.xml
> will produce the same results.
> 
> The error that libxml reports is:
> test.xml:3:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element interfaces has 
> extra content: eth
> 
> which is incorrect since the actual error is that the eth element is
> missing a mandatory attribute.
> 
> What's also interesting is that if you completely remove the definition
> and use of the "define" element in the schema (the test.xml doesn't use
> it so it can stay the same). The error stack changes to:
> test.xml:3:0:ERROR:RELAXNGV:RELAXNG_ERR_ATTRVALID: Element eth failed to 
> validate attributes
> test.xml:3:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element interfaces has 
> extra content: eth
> 
> Which is a reasonable error message, even though it would be a bit more
> user friendly if there was some kind of information about which
> attributes failed or are missing, but I can understand that...
> 
> There are a few more scenarios where similar problems occur, I can
> describe them if needed, but to keep this email shorter I will ignore
> them for now. I've also found a few bug reports that describe similar
> situations, but since they've been last updated several years ago I
> first wanted to write here before reviving them.
> 
> So I've done some digging around and figured out that all of these
> imprecise error reports are related to <interleave> <optional> and
> <choice> so rules that can easily cause non-determinism. If the
> non-determinism is handled with some kind of backtracking these kind of
> problems could arise. The other way is to create a finite automaton that
> can always be determinized solving this problem. I looked through the
> libxml sources and found that in fact a finite automaton is created
> however I didn't find anything related to it's determinization so I'm
> assuming there isn't anything. I apologize if I've missed something but
> it's a fairly long source file...
> 
> I want to ask if this is a bug you would find worth fixing or if the
> current behaviour is intended (since the bugs in the bug tracker are 5+
> years old).
> If not I might consider fixing this myself but I would like at least
> some comments about if the implementation of the determinization would
> be possible to integrate with how the validation is currently handled.
> 
> Thanks for your reply!
> 
> Best regards,
> Ondrej Lichtner
> 
> --------------------------------------
> schema.rng:
> --------------------------------------
> <grammar xmlns="http://relaxng.org/ns/structure/1.0";
>     datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes";>
>     <start>
>         <element name="host">
>             <attribute name="id"/>
> 
>             <interleave>
>                 <zeroOrMore>
>                     <ref name="params"/>
>                 </zeroOrMore>
> 
>                 <element name="interfaces">
>                     <zeroOrMore>
>                         <ref name="eth"/>
>                     </zeroOrMore>
>                 </element>
>             </interleave>
>         </element>
>     </start>
> 
>     <define name="define">
>         <element name="define">
>             <oneOrMore>
>                 <element name="alias">
>                     <attribute name="name"/>
>                     <choice>
>                         <attribute name="value"/>
>                         <text/>
>                     </choice>
>                 </element>
>             </oneOrMore>
>         </element>
>     </define>
> 
>     <define name="eth">
>         <element name="eth">
>             <attribute name="id"/>
>             <attribute name="label"/>
>             <interleave>
>                 <optional>
>                     <ref name="define"/>
>                 </optional>
> 
>                 <zeroOrMore>
>                     <ref name="params"/>
>                 </zeroOrMore>
> 
>                 <optional>
>                     <ref name="addresses"/>
>                 </optional>
>             </interleave>
>         </element>
>     </define>
> 
>     <define name="addresses">
>         <element name="addresses">
>             <interleave>
>                 <optional>
>                     <ref name="define"/>
>                 </optional>
> 
>                 <zeroOrMore>
>                     <element name="address">
>                         <choice>
>                             <attribute name="value"/>
>                             <text/>
>                         </choice>
>                     </element>
>                 </zeroOrMore>
>             </interleave>
>         </element>
>     </define>
> 
>     <define name="params">
>         <element name="params">
>             <interleave>
>                 <optional>
>                     <ref name="define"/>
>                 </optional>
> 
>                 <zeroOrMore>
>                     <element name="param">
>                         <attribute name="name"/>
>                         <choice>
>                             <attribute name="value"/>
>                             <text/>
>                         </choice>
>                     </element>
>                 </zeroOrMore>
>             </interleave>
>         </element>
>     </define>
> </grammar>
> 
> --------------------------------------
> test.xml:
> --------------------------------------
> <host id="slave1">
>     <interfaces>
>         <eth label="A">
>             <addresses>
>                 <address value="192.168.100.1/24"/>
>             </addresses>
>         </eth>
>     </interfaces>
> </host>
> --------------------------------------
> test.py:
> --------------------------------------
> #!/usr/bin/python
> from lxml import etree
> from pprint import pprint
> 
> relaxng_doc = etree.parse("schema.rng")
> schema = etree.RelaxNG(relaxng_doc)
> 
> doc = etree.parse("test.xml")
> schema.validate(doc)
> pprint(schema.error_log)
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to