Re: [whatwg] Proposal for improved handling of '#' inside of data URIs

Michael A. Puls II Sun, 11 Sep 2011 10:54:08 -0700

On Sun, 11 Sep 2011 11:30:07 -0400, Daniel Holbert <dholb...@mozilla.com>wrote:

On 09/11/2011 07:21 AM, Michael A. Puls II wrote:
Not only must "#" be "%23" if you don't want it as a frag id, but ">"
and "<" should be "%3E" and "%3C".
[...]
> Of course, if you can percent-encode everything needed as you type,you
 > can hand-author the URI data. But, who wants to do that,
As I noted in a response to Nils earlier in this thread,Firefox/Webkit/Opera don't actually require authors to percent-encodebrackets and spaces in data URIs. (not sure whether that's correct perspec or not).
For example
   data:text/html,<i>here is some italic text<i>
works just fine in all three.
So that makes it quite easy to hand-author data URIs, in fact. (asidefrom this "#" gotcha)

Yes, but it's important to know that the browser still percent-decodeseverything after the ",". It's just that in this case, there are no %HH todecode. You have to be careful here and know that the data/markup is stillnot literal. For example, if you want a literal "%5E", you have to use%255E. If you include a URI with a bunch of %HH, you have to escape allthose "%". So, while typing, if you have no problem typing %25, you shouldhave no problem typing %23.

Are you saying that data URI authors know that they have to escape "%",but don't know that they have to escape "#"? Or, are you saying that theproblem is more serious and data URI authors think the data is*completely* literal? If the latter, we definitely shouldn't beencouraging anything but properly-encoded data.

FWIW, I asked for advice on "#" in mailto URIs (since mailto URI handlersdon't make use of frag ids for mailto and frag ids are not specified formailto) at<http://lists.w3.org/Archives/Public/public-iri/2009Oct/0030.html> andwanted to propose that '#' be allowed as-is when authoring without havingto percent-encode it. But, that didn't go over too well.

   data:text/html,<i>here is some italic text<i>

I don't really like that though as it's not portable. If I wanted to copythat from the address field and paste it into a plain-text document, it'dlook funny like this:


<data:text/html,<i>here is some italic text<i>>

And, for mail clients that linkify links in plain-text messages, I can seethat going wrong with the link (the clickable, underlined part and href)ending up as only "data:text/html,<i".

So we can proactively check for >/< characters anywhere after the "#",and if we find them, then we can pretty safely assume that the authorintended for the "#" to be part of the document, rather than afragment-ID delimiter.

I still don't like it personally as it further encourages authors to notencode their data and is not portable. But, if this is to happen, itshould definitely be limited to mime types that contain markup. Itwouldn't be useful for data:text/plain (how would you differentiate inthat case?). And, for text/javascript and text/css etc., some other typeof lookahead characters(s) would have to be used.


--
Michael

Re: [whatwg] Proposal for improved handling of '#' inside of data URIs

Reply via email to