Re: URL work in HTML 5 (semifork)

Martin J. Dürst Mon, 15 Oct 2012 22:37:51 -0700

On 2012/10/16 1:30, Robin Berjon wrote:

On 15/10/2012 17:49 , Ted Hardie wrote:

On Mon, Oct 15, 2012 at 8:07 AM, Robin Berjon <[email protected]> wrote:

URLs to non-Web things (e.g. mailto:, smsto:, tel:, etc.) happen in Web
contexts. Libraries written to process those in Web contexts are
likely to
be reused elsewhere. There isn't really an option to have some of
this in
Web use cases and something else outside of it. If it's used for the
Web, it
*will* leak. Probably a lot, and probably fast.

One first question is how much we want it to leak. An example that Annebrought up is a URL with a space character in it. It is clear that thesethings exist on the Web, in not too small numbers. On the other hand,it's also clear that there are many places (some of them defined byspecs, some of them just somewhere in scripts and the like) that willjust 'blow up' when they get a space.

Do we want to make sure that all browsers treat such a space in the sameway? Most probably yes, and in this case, maybe they already do. Does itmake sense to write that down? I'd also very much say yes.

Do we want to make sure that all other places that accept URIs or IRIsalso accept a space and treat it the same? Maybe we would like to do so,but is it possible? Quite clearly no (just think HTTP request header).

This essentially means that the fork is already here. In some sense,that's really bad news. But if we look more closely, the news may not bethat bad. First, at least for the case with the space in it, we know howto convert it to an equivalent without a space: use %20 (except maybe inform parts). But we need to make sure that this is written down somewhere.

Second, and that will be more obvious for some more esoteric cases thanjust a space, I think that even among those who agree that such casesshould be described, and should be handled uniformly by browsers, therewill be quite some agreement that it's better not to produce such things.

What we end up with is something I'd call a semi-fork, which is a subsetof "recommended" URIs/IRIs within a larger set of (sometimes, but notalways) tolerated ones.

We already have this for the XML case, it's called LEIRIs(http://tools.ietf.org/html/draft-ietf-iri-3987bis-12#section-6).

At one point, we tried to do something similar to what Anne is nowtrying to address, but we did not get very far because once one goesbeyond the simple cases (such as a space), it gets messy quite quickly(read: different browsers do different things). Even though there arerepresentatives of all major browser vendors subscribed to the IRI WGmailing list, we also didn't get much in terms of contributions orfeedback (Adam and Anne occasionally were exceptions).

I agree. But that argues that an xmpp URI seen in a jabber context
and an xmpp URI seen in a web context should be the same;

Syntactically correct xmpp URIs should be the same indeed, and I thinkthey currently are.

or, to
re-iterate, that a fork would be harmful. Changing the URI parsing in
web contexts only is likely to be problematic because of leakage.
Avoiding that by retaining one way is my personal preference for the
way forward. But if those working on web-specific specs do not agree
and choose to fork, then we *must* mark the difference between the
contexts, or the results will be even worse.


I think that we're in ruthlessly violent agreement here :)

At this point we have to look at what status Anne's work could be
published under. It doesn't have to be a fork, it could simply be
published as The One True Way to parse URLs (after reviews, etc.
obviously). Is that something that could be acceptable?

I think it can easily by the One True Way to parse URLs in Web Browsers.Given some of the current differences between browsers, even that may bethough, but I very much hope that Anne can be successful.

I think that in a way similar to how the HTML5 spec currentlydistinguishes between an authoring version and a parsing version, Anne'sdocument can be the parsing version for Web browsers, and RFC 3986, and3987bis, can be the authoring version(s).

Of course, that's not a strict parallel. As an example, Anne plans toclearly document/spec how URL equivalence works in JavaScript. Foreverybody who uses JavaScript, this will clearly be a good thing.However, as http://tools.ietf.org/html/rfc3986#section-6,http://tools.ietf.org/html/rfc3987#section-5, andhttp://tools.ietf.org/html/draft-ietf-iri-comparison-01 should makequite clear, how to compare URIs/IRIs/URLs depends very much on theapplication. On one end, a spider will make as many shortcurts aspossible, where on the other end, XML namespaces and RDF will docodepoint-by-codepoint comparison, and there is clearly some value indocumenting that. (Also, an extended JavaScript library may providequite a few variants to deal with these application needs.)

Last but not least, I would like to mention that if there's anythingthat we can reasonably do to make the gap in the semifork narrower, thenwe should give it a try. Two examples: First, RFC 3987 was quite strictabout character normalization in some circumstances. It has turned outthat browsers did it differently, so we changed the spec. Also, we hadto find out that query parts don't get converted using UTF-8 as often aswe would like. So we also adapted the spec, even though that's stillunder discussion. If there are other cases that we *can* address, pleasetell us. On the other hand, I'd hope that with the work that Anne does,he also tries to narrow the gap where possible, e.g. by choosing asolution closer to RFC 3986/3987bis where browsers disagree.



Regards,   Martin.

Re: URL work in HTML 5 (semifork)

Reply via email to