Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

2009-12-01 Thread Adam Barth
Your main point is well taken.

There are some technical reasons why tag whitelisting makes more sense
for inline content.  For example, consider the case you mentioned on
webkit-dev: @id.  Inline, @id is problematic because the ids exist in
a per-frame namespace, whereas they're harmless when the untrusted
content has an entire iframe to itself.  Of course, @id is not unique
in this respect.  For example, input type=password will likely get
autofilled by the password manager inline and @style can be used to
draw all over the page without an iframe's layout contraints.

That said, I'm not married to a design with a tag-level whitelist.  Do
you have a specific alternative in mind?

Adam


On Mon, Nov 30, 2009 at 7:43 PM, Maciej Stachowiak m...@apple.com wrote:

 On Nov 30, 2009, at 6:32 PM, Adam Barth wrote:

 On Mon, Nov 30, 2009 at 5:43 PM, Maciej Stachowiak m...@apple.com wrote:

 1) It seems like this API is harder to use than a sandboxed iframe. To
 use
 it correctly, you need to determine a whitelist of safe elements and
 attributes; providing an explicit whitelist at least of tags is
 mandatory.
 With a sandboxed iframe, as a Web developer you can just ask the browser
 to
 turn off unsafe things and not worry about designing a security policy.
 Besides ease of use, there is also the concern that a server-side
 filtering
 whitelist may be buggy, and if you apply the same whitelist on the client
 side as backup instead of doing something high level like disable
 scripting then you are less likely to benefit from defense in depth,
 since
 you may just replicate the bug.

 I should follow up with folks in the ruby-on-rails community to see
 how they view their sanitize API.  The one person I asked had a
 positive opinion, but we should get a bigger sample size.

 For server-side sanitization, this kind of explicit API is pretty much the
 only thing you can do.


 I think updateWithSanitizedHTML has different use cases than @sandbox.
 I think the killer applications for @sandbox are advertisements and
 gadgets.  In those cases, the developer wants most of the browser's
 functionality, but wants to turn off some dangerous stuff (like
 plug-ins).  For updateWithSanitizedHTML, the killer application is
 something like blog comments, where you basically want text with some
 formatting tags (bold, italics, and maybe images depending on the
 forum).

 I can imagine use cases where allowing very open-ended but script-free
 content is desirable. For example, consider a hosted blog service that wants
 to let blog authors write nearly arbitrary HTML, but without allowing
 script. @sandbox would not be a good solution for that use case. In general
 it does not seem sensible to me that the choice of tag whitelisting vs
 high-level feature whitelisting is tied to the choice of embedding content
 directly vs. creating a frame. Is there a technical reason these two choices
 have to be tied?


 2) It seems like this API loses one of the big benefits of sanitizing
 HTML
 in the browser implementation. Specifically, in theory it's safe to say
 allow everything except any construct that would result in script/code
 running. You can't do that on the server side - blacklisting is not
 sound
 because you can't predict the capabilities of all browsers. But the
 browser
 can predict its own capabilities. Sandboxed iframes do allow for this.

 The benefit is that you know you're getting the right parsing.  You're
 not going to be tripped up by img/src=javascript: and friends.

 It's true, this is a benefit. However, it seems like even if you whitelist
 tags, being able to say no script at a high level

 Also, this API is useful in cases where you don't have a server to help
 you
 sanitize your input.  One example I saw recently was a GreaseMonkey
 script that wanted to add EXIF metadata to Flickr.  Basically, the
 script grabbed the EXIF data from api.flickr.com and added it to the
 current page.  Unfortunately, that meant I could use this GreaseMonkey
 script to XSS Flickr by adding HTML to my EXIF metadata.  Sure, there
 are other ways of solving the problem (I asked the developer to build
 the DOM in memory and use innerText), but you want something simple
 for these cases.

 If the EXIF metadata is supposed to be text-only, it seems like
 updateWithSanitizedHTML would not be easier to use than innerText, or in any
 way superior. For cases where it is actually desirable to allow some markup,
 it's not clear to me that giving explicit whitelists of what is allowed is
 the simple choice.


 I think the benefits of filtering by tag/attribute/scheme for advanced
 experts are outweighed by these two disadvantages for basic use, compared
 to
 something simple like the original staticInnerHTML idea. Another possible
 alternative is to express how to sanitize at a higher level, using
 something
 similar to sandboxed iframe feature strings.

 If you think of @sandbox as being optimized for rich untrusted content
 and 

Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

2009-12-01 Thread Kornel Lesiński

The WebKit community is considering taking up such an experimental
implementation.  Here's my current proposal for how this might work:

http://docs.google.com/Doc?docid=0AZpchfQ5mBrEZGQ0cDh3YzRfMTJzbTY1cWJrNAhl=en

I would appreciate any feedback on the design.


Whitelist requires developers to know about potential risks of each  
element/property, and that's not obvious to everyone: e.g. one might  
want to allow object/embed (for harmless YouTube videos) without  
realizing that it enables XSS.


It's also non-obvious that style attribute is XSS risk (via behavior  
property). Higher-level filtering option could allow style attribute,  
and only filter out that property. Current proposal would need another  
whitelist for CSS properties.


And even whitelist for CSS properties couldn't be used to implement  
No external access policy (allow images with data: urls, allow http:  
links, but not http: images). This would be useful for webmails and  
other places where website doesn't want to allow 3rd parties tracking  
views.


No clickjacking option might be useful as well.

--
regards, Kornel Lesiński



Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

2009-12-01 Thread Adam Barth
2009/12/1 Kornel Lesiński kor...@geekhood.net:
 The WebKit community is considering taking up such an experimental
 implementation.  Here's my current proposal for how this might work:


 http://docs.google.com/Doc?docid=0AZpchfQ5mBrEZGQ0cDh3YzRfMTJzbTY1cWJrNAhl=en

 I would appreciate any feedback on the design.

 Whitelist requires developers to know about potential risks of each
 element/property, and that's not obvious to everyone: e.g. one might want to
 allow object/embed (for harmless YouTube videos) without realizing that it
 enables XSS.

That's true.  It would be interesting to know how often developers
screw this up with Ruby-on-Rails' version of the API.

 It's also non-obvious that style attribute is XSS risk (via behavior
 property). Higher-level filtering option could allow style attribute, and
 only filter out that property. Current proposal would need another whitelist
 for CSS properties.

Script-in-CSS is subtle enough that it's explicitly blocked (like
javascript URLs).

 And even whitelist for CSS properties couldn't be used to implement No
 external access policy (allow images with data: urls, allow http: links,
 but not http: images). This would be useful for webmails and other places
 where website doesn't want to allow 3rd parties tracking views.

I don't think an no external access policy is worth supporting
explicitly.  If it falls out of a general design, that's great, but I
don't think the use case is compelling enough to accept the design
constraints required to support it.

 No clickjacking option might be useful as well.

I don't have a clear idea how this would work.  Did you have something
different in mind than X-Frame-Options (already supported by WebKit)?

Adam


Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

2009-12-01 Thread Kornel Lesiński


And even whitelist for CSS properties couldn't be used to implement  
No
external access policy (allow images with data: urls, allow http:  
links,
but not http: images). This would be useful for webmails and other  
places

where website doesn't want to allow 3rd parties tracking views.


I don't think an no external access policy is worth supporting
explicitly.  If it falls out of a general design, that's great, but I
don't think the use case is compelling enough to accept the design
constraints required to support it.


I think it is quite important for privacy. Otherwise web bugs can be  
placed and used to track every use of content in every mashup.


Most often I'd like formatted text in applications to be just text,  
_completely_ passive.



No clickjacking option might be useful as well.


I don't have a clear idea how this would work.  Did you have something
different in mind than X-Frame-Options (already supported by WebKit)?



On a second thought clickjacking is probably not the right term for  
what I have in mind, although it's a similar issue.


The problem is that content added in DOM could use styles to overlay  
web application's chrome and steal data with forms or redirect  
standard links/buttons to phising site, e.g. form action=evil  
style=position:fixed; top:0; right:0 that's positioned on top of  
website's standard login form.


Position:fixed escapes elements with  
position:relative;overflow:hidden, so AFAIK this cannot be prevented  
without removal of all position:fixed styles from untrusted content.


Such hack has been used on auction site allegro.pl, where auctions'  
descriptions are allowed to use CSS. In that particular case content  
should have been filtered server-side, but I imagine webmails, web- 
based feed readers and all kinds of mashups dynamically loading  
untrusted content could face similar problems, and having iframe for  
every bit of content is sometimes problematic.


--
regards, Kornel



Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

2009-12-01 Thread Adam Barth
2009/12/1 Kornel Lesiński kor...@geekhood.net:
 And even whitelist for CSS properties couldn't be used to implement No
 external access policy (allow images with data: urls, allow http: links,
 but not http: images). This would be useful for webmails and other places
 where website doesn't want to allow 3rd parties tracking views.

 I don't think an no external access policy is worth supporting
 explicitly.  If it falls out of a general design, that's great, but I
 don't think the use case is compelling enough to accept the design
 constraints required to support it.

 I think it is quite important for privacy. Otherwise web bugs can be
 placed and used to track every use of content in every mashup.

 Most often I'd like formatted text in applications to be just text,
 _completely_ passive.

I agree that it's a nice benefit of some designs, but, in my opinion,
it's not nearly as important as addressing the security issues.

More concretely, suppose you want to let folks include hyperlinks in
sanitized HTML, which I suspect many people will want to do.  You've
already lost the battle against web bugs because of DNS prefetch.

 No clickjacking option might be useful as well.

 I don't have a clear idea how this would work.  Did you have something
 different in mind than X-Frame-Options (already supported by WebKit)?

 On a second thought clickjacking is probably not the right term for what I
 have in mind, although it's a similar issue.

 The problem is that content added in DOM could use styles to overlay web
 application's chrome and steal data with forms or redirect standard
 links/buttons to phising site, e.g. form action=evil style=position:fixed;
 top:0; right:0 that's positioned on top of website's standard login form.

 Position:fixed escapes elements with position:relative;overflow:hidden, so
 AFAIK this cannot be prevented without removal of all position:fixed styles
 from untrusted content.

 Such hack has been used on auction site allegro.pl, where auctions'
 descriptions are allowed to use CSS. In that particular case content should
 have been filtered server-side, but I imagine webmails, web-based feed
 readers and all kinds of mashups dynamically loading untrusted content could
 face similar problems, and having iframe for every bit of content is
 sometimes problematic.

I agree that this is a threat worth addressing.  That's one reason why
an API that blocks only script is insufficient for inline use cases.
Notice that my proposal does mitigate this threat.

Adam


[whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

2009-11-30 Thread Adam Barth
On Fri, Jun 5, 2009 at 5:09 PM, Ian Hickson i...@hixie.ch wrote:
 Defining a spec-blessed whitelist of element, attributes, and attribute
 values is and filtering at the parser level is a significant new feature.
 While I see that it has value, I think on the short term it would be
 better to wait for a future version of HTML before introducing this
 feature; ideally once we have more implementation experience with
 experimental versions of this idea.

 I would encourage browser vendors to introduce APIs similar to that
 discussed below, clearly marked as vendor-specific (e.g. for Firefox,
 something like .mozStaticInnerHTML).

The WebKit community is considering taking up such an experimental
implementation.  Here's my current proposal for how this might work:

http://docs.google.com/Doc?docid=0AZpchfQ5mBrEZGQ0cDh3YzRfMTJzbTY1cWJrNAhl=en

I would appreciate any feedback on the design.

Thanks,
Adam


Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

2009-11-30 Thread Maciej Stachowiak


On Nov 30, 2009, at 3:55 PM, Adam Barth wrote:


On Fri, Jun 5, 2009 at 5:09 PM, Ian Hickson i...@hixie.ch wrote:
Defining a spec-blessed whitelist of element, attributes, and  
attribute
values is and filtering at the parser level is a significant new  
feature.

While I see that it has value, I think on the short term it would be
better to wait for a future version of HTML before introducing this
feature; ideally once we have more implementation experience with
experimental versions of this idea.

I would encourage browser vendors to introduce APIs similar to that
discussed below, clearly marked as vendor-specific (e.g. for Firefox,
something like .mozStaticInnerHTML).


The WebKit community is considering taking up such an experimental
implementation.  Here's my current proposal for how this might work:

http://docs.google.com/Doc?docid=0AZpchfQ5mBrEZGQ0cDh3YzRfMTJzbTY1cWJrNAhl=en

I would appreciate any feedback on the design.


I neglected to give feedback on webkit-dev but here's my comments:

1) It seems like this API is harder to use than a sandboxed iframe. To  
use it correctly, you need to determine a whitelist of safe elements  
and attributes; providing an explicit whitelist at least of tags is  
mandatory. With a sandboxed iframe, as a Web developer you can just  
ask the browser to turn off unsafe things and not worry about  
designing a security policy. Besides ease of use, there is also the  
concern that a server-side filtering whitelist may be buggy, and if  
you apply the same whitelist on the client side as backup instead of  
doing something high level like disable scripting then you are less  
likely to benefit from defense in depth, since you may just replicate  
the bug.


2) It seems like this API loses one of the big benefits of sanitizing  
HTML in the browser implementation. Specifically, in theory it's safe  
to say allow everything except any construct that would result in  
script/code running. You can't do that on the server side -  
blacklisting is not sound because you can't predict the capabilities  
of all browsers. But the browser can predict its own capabilities.  
Sandboxed iframes do allow for this.


I think the benefits of filtering by tag/attribute/scheme for advanced  
experts are outweighed by these two disadvantages for basic use,  
compared to something simple like the original staticInnerHTML idea.  
Another possible alternative is to express how to sanitize at a higher  
level, using something similar to sandboxed iframe feature strings.


Here's a problem that exists with both this API and also  
innerStaticHTML:


3) There is no secure and efficient way to append sanitized contents  
to an element that already has children. This may result in authors  
appending with innerHTML +=  (inefficient and insecure!) or  
insertAdjecentHTML() (efficient but still insecure!). I'm willing to  
concede that use cases other than replace existing contents and  
append to existing contents are fairly exotic.


Regards,
Maciej



Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

2009-11-30 Thread Adam Barth
On Mon, Nov 30, 2009 at 5:43 PM, Maciej Stachowiak m...@apple.com wrote:
 1) It seems like this API is harder to use than a sandboxed iframe. To use
 it correctly, you need to determine a whitelist of safe elements and
 attributes; providing an explicit whitelist at least of tags is mandatory.
 With a sandboxed iframe, as a Web developer you can just ask the browser to
 turn off unsafe things and not worry about designing a security policy.
 Besides ease of use, there is also the concern that a server-side filtering
 whitelist may be buggy, and if you apply the same whitelist on the client
 side as backup instead of doing something high level like disable
 scripting then you are less likely to benefit from defense in depth, since
 you may just replicate the bug.

I should follow up with folks in the ruby-on-rails community to see
how they view their sanitize API.  The one person I asked had a
positive opinion, but we should get a bigger sample size.

I think updateWithSanitizedHTML has different use cases than @sandbox.
 I think the killer applications for @sandbox are advertisements and
gadgets.  In those cases, the developer wants most of the browser's
functionality, but wants to turn off some dangerous stuff (like
plug-ins).  For updateWithSanitizedHTML, the killer application is
something like blog comments, where you basically want text with some
formatting tags (bold, italics, and maybe images depending on the
forum).

 2) It seems like this API loses one of the big benefits of sanitizing HTML
 in the browser implementation. Specifically, in theory it's safe to say
 allow everything except any construct that would result in script/code
 running. You can't do that on the server side - blacklisting is not sound
 because you can't predict the capabilities of all browsers. But the browser
 can predict its own capabilities. Sandboxed iframes do allow for this.

The benefit is that you know you're getting the right parsing.  You're
not going to be tripped up by img/src=javascript: and friends.  Also,
this API is useful in cases where you don't have a server to help you
sanitize your input.  One example I saw recently was a GreaseMonkey
script that wanted to add EXIF metadata to Flickr.  Basically, the
script grabbed the EXIF data from api.flickr.com and added it to the
current page.  Unfortunately, that meant I could use this GreaseMonkey
script to XSS Flickr by adding HTML to my EXIF metadata.  Sure, there
are other ways of solving the problem (I asked the developer to build
the DOM in memory and use innerText), but you want something simple
for these cases.

 I think the benefits of filtering by tag/attribute/scheme for advanced
 experts are outweighed by these two disadvantages for basic use, compared to
 something simple like the original staticInnerHTML idea. Another possible
 alternative is to express how to sanitize at a higher level, using something
 similar to sandboxed iframe feature strings.

If you think of @sandbox as being optimized for rich untrusted content
and updateWithSanitizedHTML as being optimized for poor untrusted
content, then you'll see that's what the API does already.  The
feature string Slashdot wants for its comments is (a b strong i em,
href), but another message board might want something different.
For example, 4chan might want (img, src alt).  I don't think these
require particularly advanced experts to understand.

 Here's a problem that exists with both this API and also innerStaticHTML:

 3) There is no secure and efficient way to append sanitized contents to an
 element that already has children. This may result in authors appending with
 innerHTML +=  (inefficient and insecure!) or insertAdjecentHTML() (efficient
 but still insecure!). I'm willing to concede that use cases other than
 replace existing contents and append to existing contents are fairly
 exotic.

Maybe we need insertAdjecentSanitizedHTML instead or in addition.  ;)

Adam


Re: [whatwg] updateWithSanitizedHTML (was Re: innerStaticHTML)

2009-11-30 Thread Maciej Stachowiak


On Nov 30, 2009, at 6:32 PM, Adam Barth wrote:

On Mon, Nov 30, 2009 at 5:43 PM, Maciej Stachowiak m...@apple.com  
wrote:
1) It seems like this API is harder to use than a sandboxed iframe.  
To use

it correctly, you need to determine a whitelist of safe elements and
attributes; providing an explicit whitelist at least of tags is  
mandatory.
With a sandboxed iframe, as a Web developer you can just ask the  
browser to
turn off unsafe things and not worry about designing a security  
policy.
Besides ease of use, there is also the concern that a server-side  
filtering
whitelist may be buggy, and if you apply the same whitelist on the  
client

side as backup instead of doing something high level like disable
scripting then you are less likely to benefit from defense in  
depth, since

you may just replicate the bug.


I should follow up with folks in the ruby-on-rails community to see
how they view their sanitize API.  The one person I asked had a
positive opinion, but we should get a bigger sample size.


For server-side sanitization, this kind of explicit API is pretty much  
the only thing you can do.




I think updateWithSanitizedHTML has different use cases than @sandbox.
I think the killer applications for @sandbox are advertisements and
gadgets.  In those cases, the developer wants most of the browser's
functionality, but wants to turn off some dangerous stuff (like
plug-ins).  For updateWithSanitizedHTML, the killer application is
something like blog comments, where you basically want text with some
formatting tags (bold, italics, and maybe images depending on the
forum).


I can imagine use cases where allowing very open-ended but script-free  
content is desirable. For example, consider a hosted blog service that  
wants to let blog authors write nearly arbitrary HTML, but without  
allowing script. @sandbox would not be a good solution for that use  
case. In general it does not seem sensible to me that the choice of  
tag whitelisting vs high-level feature whitelisting is tied to the  
choice of embedding content directly vs. creating a frame. Is there a  
technical reason these two choices have to be tied?




2) It seems like this API loses one of the big benefits of  
sanitizing HTML
in the browser implementation. Specifically, in theory it's safe to  
say
allow everything except any construct that would result in script/ 
code
running. You can't do that on the server side - blacklisting is  
not sound
because you can't predict the capabilities of all browsers. But the  
browser
can predict its own capabilities. Sandboxed iframes do allow for  
this.


The benefit is that you know you're getting the right parsing.  You're
not going to be tripped up by img/src=javascript: and friends.


It's true, this is a benefit. However, it seems like even if you  
whitelist tags, being able to say no script at a high level


Also, this API is useful in cases where you don't have a server to  
help you

sanitize your input.  One example I saw recently was a GreaseMonkey
script that wanted to add EXIF metadata to Flickr.  Basically, the
script grabbed the EXIF data from api.flickr.com and added it to the
current page.  Unfortunately, that meant I could use this GreaseMonkey
script to XSS Flickr by adding HTML to my EXIF metadata.  Sure, there
are other ways of solving the problem (I asked the developer to build
the DOM in memory and use innerText), but you want something simple
for these cases.


If the EXIF metadata is supposed to be text-only, it seems like  
updateWithSanitizedHTML would not be easier to use than innerText, or  
in any way superior. For cases where it is actually desirable to allow  
some markup, it's not clear to me that giving explicit whitelists of  
what is allowed is the simple choice.




I think the benefits of filtering by tag/attribute/scheme for  
advanced
experts are outweighed by these two disadvantages for basic use,  
compared to
something simple like the original staticInnerHTML idea. Another  
possible
alternative is to express how to sanitize at a higher level, using  
something

similar to sandboxed iframe feature strings.


If you think of @sandbox as being optimized for rich untrusted content
and updateWithSanitizedHTML as being optimized for poor untrusted
content, then you'll see that's what the API does already.  The
feature string Slashdot wants for its comments is (a b strong i em,
href), but another message board might want something different.
For example, 4chan might want (img, src alt).  I don't think these
require particularly advanced experts to understand.


updateWithSanitizedHTML and @sandbox both provide features that the  
other does not for reasons that do not seem technically necessary. For  
example, updateWithSanitizedHTML could easily have an allow  
everything except script mode, and @sandbox could easily allow per- 
tag whitelisting. Then the choice would be between the resource cost  
of a frame, and the sandboxing features that it's