In WA 1.0 and WF 2.0 some values are required to be IRIs and some values are required to be IRI references. I'm confused about what exactly this means in terms of conformance checking. (WF 2.0 does say something about processing in a browser, though.)

First, I was amazed to learn that for pure non-infoset-augmenting validation xsd:anyURI datatype does not mean anything useful beyond token and that it is not exactly an IRI reference.
http://www.imc.org/atom-syntax/mail-archive/msg17990.html
http://www.mail-archive.com/rng-users@yahoogroups.com/msg00350.html

Having read
http://www.w3.org/TR/xlink/#link-locators
I started to suspect that just about every string indeed can be considered sort of an IRI reference that can munged into an IRI reference so there's nothing to check.

Then I found
http://jena.sourceforge.net/tmp/javadoc/com/hp/hpl/jena/iri/ IRIFactory.html which provides a fascinating number of enforcement options. I could write a custom datatype wrapper for it, but I don't know which options to use.

I'd appreciate some guidance on which enforcement options to use. (E.g. should knowledge of the http scheme used? Should security issues be flagged as non-conforming? Should "SHOULD" violations be flagged as non-conforming? Etc.)



(This is the first time I venture into the world of IRIs. I have intuitively thought that they are trouble, so I have knowingly avoided minting non-URI IRIs myself.

I suspected that bad stuff happens with IRIs containing decomposed character sequences. (These can be found in the URI form due to HFS+- backed Apache setups.) Now that I've read the RFC, I think it is a very bad idea to allow decomposed characters in IRIs and that the RFC does not require percent encoding character sequences that are not invariant under NFC.

This may have relevance to how the WF 2.0 url input works. That is, it probably SHOULD (MUST?) NOT percent-decode URIs that would result in IRIs that are not invariant under NFC.)

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/


Reply via email to