Hi again, any response on this? I was pointed to a couple of bugs on Red Hat bugzilla through IRC suggesting that reimplementation of this feature would require big changes to the code so I understand if nobody wants to do it. I'm considering fixing this myself, but don't really want to spend that much time on this either. However I would at least appreciate a negative answer.
Best regards, Ondrej Lichtner On Mon, Nov 18, 2013 at 12:59:46PM +0100, Ondrej Lichtner wrote: > Hi everyone, > > in our project we've recently started using a RelaxNG schema to validate > our XML documents through the lxml python bindings of libxml2. However > sometimes the errors reported for invalid documents are very unhelpful > and even we as developers get confused and have to spend a few minutes > looking for what's actually wrong. To demonstrate I simplified our > schema and an invalid xml document with a simple python script that I've > appended to this email. The script is not needed, running > xmllint --relaxng schema.rng test.xml > will produce the same results. > > The error that libxml reports is: > test.xml:3:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element interfaces has > extra content: eth > > which is incorrect since the actual error is that the eth element is > missing a mandatory attribute. > > What's also interesting is that if you completely remove the definition > and use of the "define" element in the schema (the test.xml doesn't use > it so it can stay the same). The error stack changes to: > test.xml:3:0:ERROR:RELAXNGV:RELAXNG_ERR_ATTRVALID: Element eth failed to > validate attributes > test.xml:3:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element interfaces has > extra content: eth > > Which is a reasonable error message, even though it would be a bit more > user friendly if there was some kind of information about which > attributes failed or are missing, but I can understand that... > > There are a few more scenarios where similar problems occur, I can > describe them if needed, but to keep this email shorter I will ignore > them for now. I've also found a few bug reports that describe similar > situations, but since they've been last updated several years ago I > first wanted to write here before reviving them. > > So I've done some digging around and figured out that all of these > imprecise error reports are related to <interleave> <optional> and > <choice> so rules that can easily cause non-determinism. If the > non-determinism is handled with some kind of backtracking these kind of > problems could arise. The other way is to create a finite automaton that > can always be determinized solving this problem. I looked through the > libxml sources and found that in fact a finite automaton is created > however I didn't find anything related to it's determinization so I'm > assuming there isn't anything. I apologize if I've missed something but > it's a fairly long source file... > > I want to ask if this is a bug you would find worth fixing or if the > current behaviour is intended (since the bugs in the bug tracker are 5+ > years old). > If not I might consider fixing this myself but I would like at least > some comments about if the implementation of the determinization would > be possible to integrate with how the validation is currently handled. > > Thanks for your reply! > > Best regards, > Ondrej Lichtner > > -------------------------------------- > schema.rng: > -------------------------------------- > <grammar xmlns="http://relaxng.org/ns/structure/1.0" > datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> > <start> > <element name="host"> > <attribute name="id"/> > > <interleave> > <zeroOrMore> > <ref name="params"/> > </zeroOrMore> > > <element name="interfaces"> > <zeroOrMore> > <ref name="eth"/> > </zeroOrMore> > </element> > </interleave> > </element> > </start> > > <define name="define"> > <element name="define"> > <oneOrMore> > <element name="alias"> > <attribute name="name"/> > <choice> > <attribute name="value"/> > <text/> > </choice> > </element> > </oneOrMore> > </element> > </define> > > <define name="eth"> > <element name="eth"> > <attribute name="id"/> > <attribute name="label"/> > <interleave> > <optional> > <ref name="define"/> > </optional> > > <zeroOrMore> > <ref name="params"/> > </zeroOrMore> > > <optional> > <ref name="addresses"/> > </optional> > </interleave> > </element> > </define> > > <define name="addresses"> > <element name="addresses"> > <interleave> > <optional> > <ref name="define"/> > </optional> > > <zeroOrMore> > <element name="address"> > <choice> > <attribute name="value"/> > <text/> > </choice> > </element> > </zeroOrMore> > </interleave> > </element> > </define> > > <define name="params"> > <element name="params"> > <interleave> > <optional> > <ref name="define"/> > </optional> > > <zeroOrMore> > <element name="param"> > <attribute name="name"/> > <choice> > <attribute name="value"/> > <text/> > </choice> > </element> > </zeroOrMore> > </interleave> > </element> > </define> > </grammar> > > -------------------------------------- > test.xml: > -------------------------------------- > <host id="slave1"> > <interfaces> > <eth label="A"> > <addresses> > <address value="192.168.100.1/24"/> > </addresses> > </eth> > </interfaces> > </host> > -------------------------------------- > test.py: > -------------------------------------- > #!/usr/bin/python > from lxml import etree > from pprint import pprint > > relaxng_doc = etree.parse("schema.rng") > schema = etree.RelaxNG(relaxng_doc) > > doc = etree.parse("test.xml") > schema.validate(doc) > pprint(schema.error_log) _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml