Am 26.08.10 01:41, schrieb Adam Barth:
On Wed, Aug 25, 2010 at 1:55 PM, Ian Hickson<i...@hixie.ch>  wrote:
On Wed, 25 Aug 2010, Adam Barth wrote:
HTML should support Base64-encoded entities to make it easier for
authors to include untrusted content in their documents without
risking XSS.

Seems like a fine idea. Get browsers to implement it and I'll spec it.

I've posted a patch for WebKit:

https://bugs.webkit.org/show_bug.cgi?id=44641

Some subtleties:

1) Some base64 decoders tolerate newlines.  We don't want to decode
entities with newlines.
2) Decoding base64 results in binary data.  We'll need to convert that
data to characters in order to deal with it in the DOM.  We use always
use UTF8 for that transformation, regardless of the document's
encoding.
3) Null characters are replaced with U+FFFD.
4) The empty base64 entity&%; is consumed and is replaced with the
empty string.
5) Invalid base64 is rejected and the entity is not decoded.

Adam


Is it necessary to consider compatibility issues here? In HTML4 this
seems to have been valid code (-> http://validator.w3.org/check):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
  "http://www.w3.org/TR/html4/strict.dtd";>
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=US-ASCII">
<title>base64 entity test</title>
</head>
<body>
<p>Look at these fine ASCII characters: &%4oCT;</p>
</body>
</html>

Now it would be interpreted differently. Could this lead to old
documents changing in meaning? Do we have to consider old documents that were not completely valid (e.g. lacked a doctype declaration)?

Reply via email to