Re: [whatwg] base64 entities

Martin Janecke Thu, 26 Aug 2010 01:38:52 -0700

Am 26.08.10 01:41, schrieb Adam Barth:

On Wed, Aug 25, 2010 at 1:55 PM, Ian Hickson<i...@hixie.ch>  wrote:

On Wed, 25 Aug 2010, Adam Barth wrote:

HTML should support Base64-encoded entities to make it easier for
authors to include untrusted content in their documents without
risking XSS.


Seems like a fine idea. Get browsers to implement it and I'll spec it.


I've posted a patch for WebKit:

https://bugs.webkit.org/show_bug.cgi?id=44641

Some subtleties:

1) Some base64 decoders tolerate newlines.  We don't want to decode
entities with newlines.
2) Decoding base64 results in binary data.  We'll need to convert that
data to characters in order to deal with it in the DOM.  We use always
use UTF8 for that transformation, regardless of the document's
encoding.
3) Null characters are replaced with U+FFFD.
4) The empty base64 entity&%; is consumed and is replaced with the
empty string.
5) Invalid base64 is rejected and the entity is not decoded.

Adam


Is it necessary to consider compatibility issues here? In HTML4 this
seems to have been valid code (-> http://validator.w3.org/check):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
  "http://www.w3.org/TR/html4/strict.dtd";>
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=US-ASCII">
<title>base64 entity test</title>
</head>
<body>
<p>Look at these fine ASCII characters: &%4oCT;</p>
</body>
</html>

Now it would be interpreted differently. Could this lead to old

documents changing in meaning? Do we have to consider old documents thatwere not completely valid (e.g. lacked a doctype declaration)?

Re: [whatwg] base64 entities

Reply via email to