On Sun, 11 Sep 2011 11:30:07 -0400, Daniel Holbert <dholb...@mozilla.com> wrote:

On 09/11/2011 07:21 AM, Michael A. Puls II wrote:
Not only must "#" be "%23" if you don't want it as a frag id, but ">"
and "<" should be "%3E" and "%3C".
[...]
> Of course, if you can percent-encode everything needed as you type, you
 > can hand-author the URI data. But, who wants to do that,

As I noted in a response to Nils earlier in this thread, Firefox/Webkit/Opera don't actually require authors to percent-encode brackets and spaces in data URIs. (not sure whether that's correct per spec or not).

For example
   data:text/html,<i>here is some italic text<i>
works just fine in all three.

So that makes it quite easy to hand-author data URIs, in fact. (aside from this "#" gotcha)

Yes, but it's important to know that the browser still percent-decodes everything after the ",". It's just that in this case, there are no %HH to decode. You have to be careful here and know that the data/markup is still not literal. For example, if you want a literal "%5E", you have to use %255E. If you include a URI with a bunch of %HH, you have to escape all those "%". So, while typing, if you have no problem typing %25, you should have no problem typing %23.

Are you saying that data URI authors know that they have to escape "%", but don't know that they have to escape "#"? Or, are you saying that the problem is more serious and data URI authors think the data is *completely* literal? If the latter, we definitely shouldn't be encouraging anything but properly-encoded data.

FWIW, I asked for advice on "#" in mailto URIs (since mailto URI handlers don't make use of frag ids for mailto and frag ids are not specified for mailto) at <http://lists.w3.org/Archives/Public/public-iri/2009Oct/0030.html> and wanted to propose that '#' be allowed as-is when authoring without having to percent-encode it. But, that didn't go over too well.

   data:text/html,<i>here is some italic text<i>

I don't really like that though as it's not portable. If I wanted to copy that from the address field and paste it into a plain-text document, it'd look funny like this:

<data:text/html,<i>here is some italic text<i>>

And, for mail clients that linkify links in plain-text messages, I can see that going wrong with the link (the clickable, underlined part and href) ending up as only "data:text/html,<i".

So we can proactively check for >/< characters anywhere after the "#", and if we find them, then we can pretty safely assume that the author intended for the "#" to be part of the document, rather than a fragment-ID delimiter.

I still don't like it personally as it further encourages authors to not encode their data and is not portable. But, if this is to happen, it should definitely be limited to mime types that contain markup. It wouldn't be useful for data:text/plain (how would you differentiate in that case?). And, for text/javascript and text/css etc., some other type of lookahead characters(s) would have to be used.

--
Michael

Reply via email to