Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)
2009/12/1 Kornel Lesiński : >>> And even whitelist for CSS properties couldn't be used to implement "No >>> external access" policy (allow images with data: urls, allow http: links, >>> but not http: images). This would be useful for webmails and other places >>> where website doesn't want to allow 3rd parties tracking views. >> >> I don't think an no external access policy is worth supporting >> explicitly. If it falls out of a general design, that's great, but I >> don't think the use case is compelling enough to accept the design >> constraints required to support it. > > I think it is quite important for privacy. Otherwise "web bugs" can be > placed and used to track every use of content in every mashup. > > Most often I'd like formatted text in applications to be just text, > _completely_ passive. I agree that it's a nice benefit of some designs, but, in my opinion, it's not nearly as important as addressing the security issues. More concretely, suppose you want to let folks include hyperlinks in sanitized HTML, which I suspect many people will want to do. You've already lost the battle against web bugs because of DNS prefetch. >>> "No clickjacking" option might be useful as well. >> >> I don't have a clear idea how this would work. Did you have something >> different in mind than X-Frame-Options (already supported by WebKit)? > > On a second thought clickjacking is probably not the right term for what I > have in mind, although it's a similar issue. > > The problem is that content added in DOM could use styles to overlay web > application's "chrome" and steal data with forms or redirect standard > links/buttons to phising site, e.g. that's positioned on top of website's standard login form. > > Position:fixed escapes elements with "position:relative;overflow:hidden", so > AFAIK this cannot be prevented without removal of all position:fixed styles > from untrusted content. > > Such hack has been used on auction site allegro.pl, where auctions' > descriptions are allowed to use CSS. In that particular case content should > have been filtered server-side, but I imagine webmails, web-based feed > readers and all kinds of mashups dynamically loading untrusted content could > face similar problems, and having for every bit of content is > sometimes problematic. I agree that this is a threat worth addressing. That's one reason why an API that blocks only script is insufficient for inline use cases. Notice that my proposal does mitigate this threat. Adam
Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)
And even whitelist for CSS properties couldn't be used to implement "No external access" policy (allow images with data: urls, allow http: links, but not http: images). This would be useful for webmails and other places where website doesn't want to allow 3rd parties tracking views. I don't think an no external access policy is worth supporting explicitly. If it falls out of a general design, that's great, but I don't think the use case is compelling enough to accept the design constraints required to support it. I think it is quite important for privacy. Otherwise "web bugs" can be placed and used to track every use of content in every mashup. Most often I'd like formatted text in applications to be just text, _completely_ passive. "No clickjacking" option might be useful as well. I don't have a clear idea how this would work. Did you have something different in mind than X-Frame-Options (already supported by WebKit)? On a second thought clickjacking is probably not the right term for what I have in mind, although it's a similar issue. The problem is that content added in DOM could use styles to overlay web application's "chrome" and steal data with forms or redirect standard links/buttons to phising site, e.g. style="position:fixed; top:0; right:0"> that's positioned on top of website's standard login form. Position:fixed escapes elements with "position:relative;overflow:hidden", so AFAIK this cannot be prevented without removal of all position:fixed styles from untrusted content. Such hack has been used on auction site allegro.pl, where auctions' descriptions are allowed to use CSS. In that particular case content should have been filtered server-side, but I imagine webmails, web- based feed readers and all kinds of mashups dynamically loading untrusted content could face similar problems, and having for every bit of content is sometimes problematic. -- regards, Kornel
Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)
2009/12/1 Kornel Lesiński : >> The WebKit community is considering taking up such an experimental >> implementation. Here's my current proposal for how this might work: >> >> >> http://docs.google.com/Doc?docid=0AZpchfQ5mBrEZGQ0cDh3YzRfMTJzbTY1cWJrNA&hl=en >> >> I would appreciate any feedback on the design. > > Whitelist requires developers to know about potential risks of each > element/property, and that's not obvious to everyone: e.g. one might want to > allow object/embed (for harmless YouTube videos) without realizing that it > enables XSS. That's true. It would be interesting to know how often developers screw this up with Ruby-on-Rails' version of the API. > It's also non-obvious that style attribute is XSS risk (via behavior > property). Higher-level filtering option could allow style attribute, and > only filter out that property. Current proposal would need another whitelist > for CSS properties. Script-in-CSS is subtle enough that it's explicitly blocked (like javascript URLs). > And even whitelist for CSS properties couldn't be used to implement "No > external access" policy (allow images with data: urls, allow http: links, > but not http: images). This would be useful for webmails and other places > where website doesn't want to allow 3rd parties tracking views. I don't think an no external access policy is worth supporting explicitly. If it falls out of a general design, that's great, but I don't think the use case is compelling enough to accept the design constraints required to support it. > "No clickjacking" option might be useful as well. I don't have a clear idea how this would work. Did you have something different in mind than X-Frame-Options (already supported by WebKit)? Adam
Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)
The WebKit community is considering taking up such an experimental implementation. Here's my current proposal for how this might work: http://docs.google.com/Doc?docid=0AZpchfQ5mBrEZGQ0cDh3YzRfMTJzbTY1cWJrNA&hl=en I would appreciate any feedback on the design. Whitelist requires developers to know about potential risks of each element/property, and that's not obvious to everyone: e.g. one might want to allow object/embed (for harmless YouTube videos) without realizing that it enables XSS. It's also non-obvious that style attribute is XSS risk (via behavior property). Higher-level filtering option could allow style attribute, and only filter out that property. Current proposal would need another whitelist for CSS properties. And even whitelist for CSS properties couldn't be used to implement "No external access" policy (allow images with data: urls, allow http: links, but not http: images). This would be useful for webmails and other places where website doesn't want to allow 3rd parties tracking views. "No clickjacking" option might be useful as well. -- regards, Kornel Lesiński
Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)
Your main point is well taken. There are some technical reasons why tag whitelisting makes more sense for inline content. For example, consider the case you mentioned on webkit-dev: @id. Inline, @id is problematic because the ids exist in a per-frame namespace, whereas they're harmless when the untrusted content has an entire iframe to itself. Of course, @id is not unique in this respect. For example, will likely get autofilled by the password manager inline and @style can be used to draw all over the page without an iframe's layout contraints. That said, I'm not married to a design with a tag-level whitelist. Do you have a specific alternative in mind? Adam On Mon, Nov 30, 2009 at 7:43 PM, Maciej Stachowiak wrote: > > On Nov 30, 2009, at 6:32 PM, Adam Barth wrote: > >> On Mon, Nov 30, 2009 at 5:43 PM, Maciej Stachowiak wrote: >>> >>> 1) It seems like this API is harder to use than a sandboxed iframe. To >>> use >>> it correctly, you need to determine a whitelist of safe elements and >>> attributes; providing an explicit whitelist at least of tags is >>> mandatory. >>> With a sandboxed iframe, as a Web developer you can just ask the browser >>> to >>> turn off unsafe things and not worry about designing a security policy. >>> Besides ease of use, there is also the concern that a server-side >>> filtering >>> whitelist may be buggy, and if you apply the same whitelist on the client >>> side as backup instead of doing something high level like "disable >>> scripting" then you are less likely to benefit from defense in depth, >>> since >>> you may just replicate the bug. >> >> I should follow up with folks in the ruby-on-rails community to see >> how they view their sanitize API. The one person I asked had a >> positive opinion, but we should get a bigger sample size. > > For server-side sanitization, this kind of explicit API is pretty much the > only thing you can do. > >> >> I think updateWithSanitizedHTML has different use cases than @sandbox. >> I think the killer applications for @sandbox are advertisements and >> gadgets. In those cases, the developer wants most of the browser's >> functionality, but wants to turn off some dangerous stuff (like >> plug-ins). For updateWithSanitizedHTML, the killer application is >> something like blog comments, where you basically want text with some >> formatting tags (bold, italics, and maybe images depending on the >> forum). > > I can imagine use cases where allowing very open-ended but script-free > content is desirable. For example, consider a hosted blog service that wants > to let blog authors write nearly arbitrary HTML, but without allowing > script. @sandbox would not be a good solution for that use case. In general > it does not seem sensible to me that the choice of tag whitelisting vs > high-level feature whitelisting is tied to the choice of embedding content > directly vs. creating a frame. Is there a technical reason these two choices > have to be tied? > >> >>> 2) It seems like this API loses one of the big benefits of sanitizing >>> HTML >>> in the browser implementation. Specifically, in theory it's safe to say >>> "allow everything except any construct that would result in script/code >>> running". You can't do that on the server side - blacklisting is not >>> sound >>> because you can't predict the capabilities of all browsers. But the >>> browser >>> can predict its own capabilities. Sandboxed iframes do allow for this. >> >> The benefit is that you know you're getting the right parsing. You're >> not going to be tripped up by > It's true, this is a benefit. However, it seems like even if you whitelist > tags, being able to say "no script" at a high level > >> Also, this API is useful in cases where you don't have a server to help >> you >> sanitize your input. One example I saw recently was a GreaseMonkey >> script that wanted to add EXIF metadata to Flickr. Basically, the >> script grabbed the EXIF data from api.flickr.com and added it to the >> current page. Unfortunately, that meant I could use this GreaseMonkey >> script to XSS Flickr by adding HTML to my EXIF metadata. Sure, there >> are other ways of solving the problem (I asked the developer to build >> the DOM in memory and use innerText), but you want something simple >> for these cases. > > If the EXIF metadata is supposed to be text-only, it seems like > updateWithSanitizedHTML would not be easier to use than innerText, or in any > way superior. For cases where it is actually desirable to allow some markup, > it's not clear to me that giving explicit whitelists of what is allowed is > the simple choice. > >> >>> I think the benefits of filtering by tag/attribute/scheme for advanced >>> experts are outweighed by these two disadvantages for basic use, compared >>> to >>> something simple like the original staticInnerHTML idea. Another possible >>> alternative is to express how to sanitize at a higher level, using >>> something >>> similar to sandboxed iframe featur
Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)
On Nov 30, 2009, at 6:32 PM, Adam Barth wrote: On Mon, Nov 30, 2009 at 5:43 PM, Maciej Stachowiak wrote: 1) It seems like this API is harder to use than a sandboxed iframe. To use it correctly, you need to determine a whitelist of safe elements and attributes; providing an explicit whitelist at least of tags is mandatory. With a sandboxed iframe, as a Web developer you can just ask the browser to turn off unsafe things and not worry about designing a security policy. Besides ease of use, there is also the concern that a server-side filtering whitelist may be buggy, and if you apply the same whitelist on the client side as backup instead of doing something high level like "disable scripting" then you are less likely to benefit from defense in depth, since you may just replicate the bug. I should follow up with folks in the ruby-on-rails community to see how they view their sanitize API. The one person I asked had a positive opinion, but we should get a bigger sample size. For server-side sanitization, this kind of explicit API is pretty much the only thing you can do. I think updateWithSanitizedHTML has different use cases than @sandbox. I think the killer applications for @sandbox are advertisements and gadgets. In those cases, the developer wants most of the browser's functionality, but wants to turn off some dangerous stuff (like plug-ins). For updateWithSanitizedHTML, the killer application is something like blog comments, where you basically want text with some formatting tags (bold, italics, and maybe images depending on the forum). I can imagine use cases where allowing very open-ended but script-free content is desirable. For example, consider a hosted blog service that wants to let blog authors write nearly arbitrary HTML, but without allowing script. @sandbox would not be a good solution for that use case. In general it does not seem sensible to me that the choice of tag whitelisting vs high-level feature whitelisting is tied to the choice of embedding content directly vs. creating a frame. Is there a technical reason these two choices have to be tied? 2) It seems like this API loses one of the big benefits of sanitizing HTML in the browser implementation. Specifically, in theory it's safe to say "allow everything except any construct that would result in script/ code running". You can't do that on the server side - blacklisting is not sound because you can't predict the capabilities of all browsers. But the browser can predict its own capabilities. Sandboxed iframes do allow for this. The benefit is that you know you're getting the right parsing. You're not going to be tripped up by It's true, this is a benefit. However, it seems like even if you whitelist tags, being able to say "no script" at a high level Also, this API is useful in cases where you don't have a server to help you sanitize your input. One example I saw recently was a GreaseMonkey script that wanted to add EXIF metadata to Flickr. Basically, the script grabbed the EXIF data from api.flickr.com and added it to the current page. Unfortunately, that meant I could use this GreaseMonkey script to XSS Flickr by adding HTML to my EXIF metadata. Sure, there are other ways of solving the problem (I asked the developer to build the DOM in memory and use innerText), but you want something simple for these cases. If the EXIF metadata is supposed to be text-only, it seems like updateWithSanitizedHTML would not be easier to use than innerText, or in any way superior. For cases where it is actually desirable to allow some markup, it's not clear to me that giving explicit whitelists of what is allowed is the simple choice. I think the benefits of filtering by tag/attribute/scheme for advanced experts are outweighed by these two disadvantages for basic use, compared to something simple like the original staticInnerHTML idea. Another possible alternative is to express how to sanitize at a higher level, using something similar to sandboxed iframe feature strings. If you think of @sandbox as being optimized for rich untrusted content and updateWithSanitizedHTML as being optimized for poor untrusted content, then you'll see that's what the API does already. The feature string Slashdot wants for its comments is ("a b strong i em", "href"), but another message board might want something different. For example, 4chan might want ("img", "src alt"). I don't think these require particularly advanced experts to understand. updateWithSanitizedHTML and @sandbox both provide features that the other does not for reasons that do not seem technically necessary. For example, updateWithSanitizedHTML could easily have an "allow everything except script" mode, and @sandbox could easily allow per- tag whitelisting. Then the choice would be between the resource cost of a frame, and the sandboxing features that it's impractical to provide without a
Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)
On Mon, Nov 30, 2009 at 5:43 PM, Maciej Stachowiak wrote: > 1) It seems like this API is harder to use than a sandboxed iframe. To use > it correctly, you need to determine a whitelist of safe elements and > attributes; providing an explicit whitelist at least of tags is mandatory. > With a sandboxed iframe, as a Web developer you can just ask the browser to > turn off unsafe things and not worry about designing a security policy. > Besides ease of use, there is also the concern that a server-side filtering > whitelist may be buggy, and if you apply the same whitelist on the client > side as backup instead of doing something high level like "disable > scripting" then you are less likely to benefit from defense in depth, since > you may just replicate the bug. I should follow up with folks in the ruby-on-rails community to see how they view their sanitize API. The one person I asked had a positive opinion, but we should get a bigger sample size. I think updateWithSanitizedHTML has different use cases than @sandbox. I think the killer applications for @sandbox are advertisements and gadgets. In those cases, the developer wants most of the browser's functionality, but wants to turn off some dangerous stuff (like plug-ins). For updateWithSanitizedHTML, the killer application is something like blog comments, where you basically want text with some formatting tags (bold, italics, and maybe images depending on the forum). > 2) It seems like this API loses one of the big benefits of sanitizing HTML > in the browser implementation. Specifically, in theory it's safe to say > "allow everything except any construct that would result in script/code > running". You can't do that on the server side - blacklisting is not sound > because you can't predict the capabilities of all browsers. But the browser > can predict its own capabilities. Sandboxed iframes do allow for this. The benefit is that you know you're getting the right parsing. You're not going to be tripped up by I think the benefits of filtering by tag/attribute/scheme for advanced > experts are outweighed by these two disadvantages for basic use, compared to > something simple like the original staticInnerHTML idea. Another possible > alternative is to express how to sanitize at a higher level, using something > similar to sandboxed iframe feature strings. If you think of @sandbox as being optimized for rich untrusted content and updateWithSanitizedHTML as being optimized for poor untrusted content, then you'll see that's what the API does already. The feature string Slashdot wants for its comments is ("a b strong i em", "href"), but another message board might want something different. For example, 4chan might want ("img", "src alt"). I don't think these require particularly advanced experts to understand. > Here's a problem that exists with both this API and also innerStaticHTML: > > 3) There is no secure and efficient way to append sanitized contents to an > element that already has children. This may result in authors appending with > innerHTML += (inefficient and insecure!) or insertAdjecentHTML() (efficient > but still insecure!). I'm willing to concede that use cases other than > "replace existing contents" and "append to existing contents" are fairly > exotic. Maybe we need insertAdjecentSanitizedHTML instead or in addition. ;) Adam
Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)
On Nov 30, 2009, at 3:55 PM, Adam Barth wrote: On Fri, Jun 5, 2009 at 5:09 PM, Ian Hickson wrote: Defining a spec-blessed whitelist of element, attributes, and attribute values is and filtering at the parser level is a significant new feature. While I see that it has value, I think on the short term it would be better to wait for a future version of HTML before introducing this feature; ideally once we have more implementation experience with experimental versions of this idea. I would encourage browser vendors to introduce APIs similar to that discussed below, clearly marked as vendor-specific (e.g. for Firefox, something like .mozStaticInnerHTML). The WebKit community is considering taking up such an experimental implementation. Here's my current proposal for how this might work: http://docs.google.com/Doc?docid=0AZpchfQ5mBrEZGQ0cDh3YzRfMTJzbTY1cWJrNA&hl=en I would appreciate any feedback on the design. I neglected to give feedback on webkit-dev but here's my comments: 1) It seems like this API is harder to use than a sandboxed iframe. To use it correctly, you need to determine a whitelist of safe elements and attributes; providing an explicit whitelist at least of tags is mandatory. With a sandboxed iframe, as a Web developer you can just ask the browser to turn off unsafe things and not worry about designing a security policy. Besides ease of use, there is also the concern that a server-side filtering whitelist may be buggy, and if you apply the same whitelist on the client side as backup instead of doing something high level like "disable scripting" then you are less likely to benefit from defense in depth, since you may just replicate the bug. 2) It seems like this API loses one of the big benefits of sanitizing HTML in the browser implementation. Specifically, in theory it's safe to say "allow everything except any construct that would result in script/code running". You can't do that on the server side - blacklisting is not sound because you can't predict the capabilities of all browsers. But the browser can predict its own capabilities. Sandboxed iframes do allow for this. I think the benefits of filtering by tag/attribute/scheme for advanced experts are outweighed by these two disadvantages for basic use, compared to something simple like the original staticInnerHTML idea. Another possible alternative is to express how to sanitize at a higher level, using something similar to sandboxed iframe feature strings. Here's a problem that exists with both this API and also innerStaticHTML: 3) There is no secure and efficient way to append sanitized contents to an element that already has children. This may result in authors appending with innerHTML += (inefficient and insecure!) or insertAdjecentHTML() (efficient but still insecure!). I'm willing to concede that use cases other than "replace existing contents" and "append to existing contents" are fairly exotic. Regards, Maciej
[whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)
On Fri, Jun 5, 2009 at 5:09 PM, Ian Hickson wrote: > Defining a spec-blessed whitelist of element, attributes, and attribute > values is and filtering at the parser level is a significant new feature. > While I see that it has value, I think on the short term it would be > better to wait for a future version of HTML before introducing this > feature; ideally once we have more implementation experience with > experimental versions of this idea. > > I would encourage browser vendors to introduce APIs similar to that > discussed below, clearly marked as vendor-specific (e.g. for Firefox, > something like .mozStaticInnerHTML). The WebKit community is considering taking up such an experimental implementation. Here's my current proposal for how this might work: http://docs.google.com/Doc?docid=0AZpchfQ5mBrEZGQ0cDh3YzRfMTJzbTY1cWJrNA&hl=en I would appreciate any feedback on the design. Thanks, Adam