Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

Maciej Stachowiak Mon, 30 Nov 2009 19:43:14 -0800


On Nov 30, 2009, at 6:32 PM, Adam Barth wrote:

On Mon, Nov 30, 2009 at 5:43 PM, Maciej Stachowiak <m...@apple.com>wrote:
1) It seems like this API is harder to use than a sandboxed iframe.To use
it correctly, you need to determine a whitelist of safe elements and
attributes; providing an explicit whitelist at least of tags ismandatory.With a sandboxed iframe, as a Web developer you can just ask thebrowser toturn off unsafe things and not worry about designing a securitypolicy.Besides ease of use, there is also the concern that a server-sidefilteringwhitelist may be buggy, and if you apply the same whitelist on theclient
side as backup instead of doing something high level like "disable
scripting" then you are less likely to benefit from defense indepth, since
you may just replicate the bug.
I should follow up with folks in the ruby-on-rails community to see
how they view their sanitize API.  The one person I asked had a
positive opinion, but we should get a bigger sample size.

For server-side sanitization, this kind of explicit API is pretty muchthe only thing you can do.


I think updateWithSanitizedHTML has different use cases than @sandbox.
I think the killer applications for @sandbox are advertisements and
gadgets.  In those cases, the developer wants most of the browser's
functionality, but wants to turn off some dangerous stuff (like
plug-ins).  For updateWithSanitizedHTML, the killer application is
something like blog comments, where you basically want text with some
formatting tags (bold, italics, and maybe images depending on the
forum).

I can imagine use cases where allowing very open-ended but script-freecontent is desirable. For example, consider a hosted blog service thatwants to let blog authors write nearly arbitrary HTML, but withoutallowing script. @sandbox would not be a good solution for that usecase. In general it does not seem sensible to me that the choice oftag whitelisting vs high-level feature whitelisting is tied to thechoice of embedding content directly vs. creating a frame. Is there atechnical reason these two choices have to be tied?

2) It seems like this API loses one of the big benefits ofsanitizing HTMLin the browser implementation. Specifically, in theory it's safe tosay"allow everything except any construct that would result in script/coderunning". You can't do that on the server side - blacklisting isnot soundbecause you can't predict the capabilities of all browsers. But thebrowsercan predict its own capabilities. Sandboxed iframes do allow forthis.
The benefit is that you know you're getting the right parsing.  You're
not going to be tripped up by <img/src=javascript: and friends.

It's true, this is a benefit. However, it seems like even if youwhitelist tags, being able to say "no script" at a high level

Also, this API is useful in cases where you don't have a server tohelp you

sanitize your input.  One example I saw recently was a GreaseMonkey
script that wanted to add EXIF metadata to Flickr.  Basically, the
script grabbed the EXIF data from api.flickr.com and added it to the
current page.  Unfortunately, that meant I could use this GreaseMonkey
script to XSS Flickr by adding HTML to my EXIF metadata.  Sure, there
are other ways of solving the problem (I asked the developer to build
the DOM in memory and use innerText), but you want something simple
for these cases.

If the EXIF metadata is supposed to be text-only, it seems likeupdateWithSanitizedHTML would not be easier to use than innerText, orin any way superior. For cases where it is actually desirable to allowsome markup, it's not clear to me that giving explicit whitelists ofwhat is allowed is the simple choice.

I think the benefits of filtering by tag/attribute/scheme foradvancedexperts are outweighed by these two disadvantages for basic use,compared tosomething simple like the original staticInnerHTML idea. Anotherpossiblealternative is to express how to sanitize at a higher level, usingsomething
similar to sandboxed iframe feature strings.
If you think of @sandbox as being optimized for rich untrusted content
and updateWithSanitizedHTML as being optimized for poor untrusted
content, then you'll see that's what the API does already.  The
feature string Slashdot wants for its comments is ("a b strong i em",
"href"), but another message board might want something different.
For example, 4chan might want ("img", "src alt").  I don't think these
require particularly advanced experts to understand.

updateWithSanitizedHTML and @sandbox both provide features that theother does not for reasons that do not seem technically necessary. Forexample, updateWithSanitizedHTML could easily have an "alloweverything except script" mode, and @sandbox could easily allow per-tag whitelisting. Then the choice would be between the resource costof a frame, and the sandboxing features that it's impractical toprovide without a frame (limiting content to a bounding box whilestill allowing styling, allowing script without affecting thecontaining content, etc).

Here's a problem that exists with both this API and alsoinnerStaticHTML:
3) There is no secure and efficient way to append sanitizedcontents to anelement that already has children. This may result in authorsappending withinnerHTML += (inefficient and insecure!) or insertAdjecentHTML()(efficientbut still insecure!). I'm willing to concede that use cases otherthan"replace existing contents" and "append to existing contents" arefairly
exotic.
Maybe we need insertAdjecentSanitizedHTML instead or in addition.  ;)

Perhaps. The verb "update" is generic enough that it could handledifferent kinds of mutations with flags, but perhaps that means it istoo vague for a security-sensitive API.


Regards,
Maciej

Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

Reply via email to