On Fri, Nov 29, 2013 at 11:53:17PM +0100, Jan Pokorný wrote: > $ cat small.rng > <grammar datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" > xmlns="http://relaxng.org/ns/structure/1.0"> > > <start> > <element name="script"> > <attribute name="file"> > <data type="token"> > <except> > <data type="token"> > <param > name="pattern">/etc/(rc\.d/)?init\.d/cman</param> > </data> > </except> > </data> > </attribute> > </element> > </start> > > </grammar> > > --- > > $ cat testcase.xml > <script file=" /etc/rc.d/init.d/cman "/> > > --- > > before (bug in question present): > > $ xmllint --noout --relaxng small.rng testcase.xml > testcase.xml validates > > desired (hopefully, this is not a false assumption, this is also > the behavior of jing or xmllint when the attribute value > is whitespace-normalized manually): > > $ xmllint --noout --relaxng small.rng testcase.xml > testcase.xml fails to validate >
Best is to go back to the spec to get a normative answer: token is: http://www.w3.org/TR/xmlschema-2/#token [Definition:] token represents tokenized strings. The ·value space· of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. the definition is based on the value space, and indeed the value space after stripping of the surrounding space for the filepath is a token. Then let's check pattern: [Definition:] pattern is a constraint on the ·value space· of a datatype which is achieved by constraining the ·lexical space· to literals which match a specific pattern. The value of pattern ·must· be a ·regular expression·. again, the regexp must be tested on the value space of the datatype i.e. the one with the extra white space(s) trimmed. > > The patch fixes the issue, but I must admit it's more like the easiest > solution I was able to achieve, not necessarily a proper one (also > considering the various contexts the affected code can be run in). > > Generally, it seems that some relevant parts of the code are affected > by some change trying to be backwards compatible; > from xmlSchemaValidateFacetWhtsp (the originally used function?): > > > Note that @value needs to be the *normalized* value if the facet > > is of type "pattern". > > Please let me know if I can help somehow to get the test case passing. > If agreed, I will also turn it to the proper part of the test suite. > > And yes, test suite still passes. I think the patch is correct, I think we could improve it to use the value space carried in val->value.str for all the types derived from 'string' , including token. A revised improved patch would explicit the enum values in include/libxml/schemasInternals.h i.e. XML_SCHEMAS_UNKNOWN = 0, XML_SCHEMAS_STRING = 1, XML_SCHEMAS_NORMSTRING = 2, ... and then use the 2 range comparison instead of val->type == XML_SCHEMAS_TOKEN in the added test in xmlSchemaValidateFacetInternal that and adding the test case would be a good way to improve the patch indeed, i would be fine applying as-is, but if you can build a better version, as above that would be welcome ! thanks, Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml