Re: [whatwg] More prohibited characters for unquoted attributes are needed
Ian Hickson wrote: On Mon, 7 Sep 2009, Aryeh Gregor wrote: On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon foolistbar at googlemail.com wrote: Apparently Hixie had previously said he didn't want to change this as it will become a non-issue over time. I think it does matter due to the security issues it presents in existing UAs. Conforming markup (using elements/attributes allowed in HTML 4.01) should not cause JS to execute in one browser but not in another. I agree with you as an author. I wrote an HTML output function in MediaWiki assuming that what the standard says is known to be interoperable, which is apparently wrong. If I hadn't been keeping up with HTML 5, I would have introduced an XSS vulnerability because of some browsers' handling of `. If the problem will go away with time, then perhaps a later version of the standard could make such unquoted attributes conforming, once there's no more problem with them. As far as I can tell, this is an IE bug; treating ` as an attribute quoting character is non-conforming in any version of HTML so far, it seems. I'm certainly not going to make it non-conforming to stumble into any IE bug or difference in parsing between IE and previous specs or other browsers; we'd just end up with an asanine set of conformance requirements. I agree that it's pointless to make it non-conforming to hit any parsing bug, but I would argue that we should make as many cases as it is sensible to do so non-conforming if they open up security holes in websites on legacy UAs, given that website uses a HTML 5 parser/sanitizer/serializer. For example, should this be non-conforming? !DOCTYPE html titleTest/title form labelSearch: input type=text/label input type=submit /form This perfectly innocent piece of HTML content (HTML2-compliant except for the DOCTYPE) results in a non-tree DOM in IE8. Should we make it non-conforming? No, it opens up no security hole if that is done. Similarly, IE conditional comments make it trivial to trigger scripts in IE but not another UA; indeed people do this on purpose. Should we make those non-conforming also? They are a harder issue, but I think it is probably fair enough to assume that most sanitizers drop comments for such reasons, hence making them fine to leave as conforming also. As I understand it, the attack here is a site that allows the user to input text that is used verbatim in two attributes, such that the user can set the first attribute's value to: ` ...and the second to: ` onload='...payload...' end=x ...with the assumption that the site is going to not quote the first one, and quote the second one with double quotes: (This is the default behaviour of Python html5lib, FWIW: the first is not quoted as it does not contain any whitespace characters or U+003E (), the latter is quoted for that reason.) body title=` class=` onload='...payload...' end=x ...which in IE, for some reason, gets treated as: body title=' class=' onload='...payload...' end='x' Indeed, this is the attack I (and others) am concerned about. I've disallowed ` in unquoted attribute values for now, but I think we should revert this once IE has fixed this bug for a few years. Right, once versions of IE with this bug have faded out of existence I think this will become a non-issue. I also expect that'll be a while yet, though, and I highly doubt that time will have come even by the time when HTML 5 goes to REC. Furthermore, if there are similar attacks to this, I think they should similarly be made non-conforming. -- Geoffrey Sneddon — Opera Software http://gsnedders.com/ http://www.opera.com/
Re: [whatwg] More prohibited characters for unquoted attributes are needed
On Mon, 7 Sep 2009, Aryeh Gregor wrote: On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon foolist...@googlemail.com wrote: Apparently Hixie had previously said he didn't want to change this as it will become a non-issue over time. I think it does matter due to the security issues it presents in existing UAs. Conforming markup (using elements/attributes allowed in HTML 4.01) should not cause JS to execute in one browser but not in another. I agree with you as an author. I wrote an HTML output function in MediaWiki assuming that what the standard says is known to be interoperable, which is apparently wrong. If I hadn't been keeping up with HTML 5, I would have introduced an XSS vulnerability because of some browsers' handling of `. If the problem will go away with time, then perhaps a later version of the standard could make such unquoted attributes conforming, once there's no more problem with them. As far as I can tell, this is an IE bug; treating ` as an attribute quoting character is non-conforming in any version of HTML so far, it seems. I'm certainly not going to make it non-conforming to stumble into any IE bug or difference in parsing between IE and previous specs or other browsers; we'd just end up with an asanine set of conformance requirements. For example, should this be non-conforming? !DOCTYPE html titleTest/title form labelSearch: input type=text/label input type=submit /form This perfectly innocent piece of HTML content (HTML2-compliant except for the DOCTYPE) results in a non-tree DOM in IE8. Should we make it non-conforming? Similarly, IE conditional comments make it trivial to trigger scripts in IE but not another UA; indeed people do this on purpose. Should we make those non-conforming also? As I understand it, the attack here is a site that allows the user to input text that is used verbatim in two attributes, such that the user can set the first attribute's value to: ` ...and the second to: ` onload='...payload...' end=x ...with the assumption that the site is going to not quote the first one, and quote the second one with double quotes: body title=` class=` onload='...payload...' end=x ...which in IE, for some reason, gets treated as: body title=' class=' onload='...payload...' end='x' I've disallowed ` in unquoted attribute values for now, but I think we should revert this once IE has fixed this bug for a few years. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] More prohibited characters for unquoted attributes are needed
On Sun, 6 Sep 2009, Aryeh Gregor wrote: See some research here: http://code.google.com/p/html5lib/issues/detail?id=93 It seems like in addition to whitespace and '= , the characters U+ through U+0020 should be banned from unquoted attribute values, as well as U+0060 (backtick `), for the sake of compatibility. On Mon, 7 Sep 2009, Geoffrey Sneddon wrote: Apparently Hixie had previously said he didn't want to change this as it will become a non-issue over time. I think it does matter due to the security issues it presents in existing UAs. Conforming markup (using elements/attributes allowed in HTML 4.01) should not cause JS to execute in one browser but not in another. The right fix here is to have the browsers all implement the same parser algorithm. Validators are welcome to warn about this case, though. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] More prohibited characters for unquoted attributes are needed
On Mon, Sep 7, 2009 at 1:34 PM, Geoffrey Sneddon foolist...@googlemail.com wrote: Apparently Hixie had previously said he didn't want to change this as it will become a non-issue over time. I think it does matter due to the security issues it presents in existing UAs. Conforming markup (using elements/attributes allowed in HTML 4.01) should not cause JS to execute in one browser but not in another. I agree with you as an author. I wrote an HTML output function in MediaWiki assuming that what the standard says is known to be interoperable, which is apparently wrong. If I hadn't been keeping up with HTML 5, I would have introduced an XSS vulnerability because of some browsers' handling of `. If the problem will go away with time, then perhaps a later version of the standard could make such unquoted attributes conforming, once there's no more problem with them.
Re: [whatwg] More prohibited characters for unquoted attributes are needed
On 6 Sep 2009, at 12:35, Aryeh Gregor wrote: See some research here: http://code.google.com/p/html5lib/issues/detail?id=93 It seems like in addition to whitespace and '= , the characters U+ through U+0020 should be banned from unquoted attribute values, as well as U+0060 (backtick `), for the sake of compatibility. Apparently Hixie had previously said he didn't want to change this as it will become a non-issue over time. I think it does matter due to the security issues it presents in existing UAs. Conforming markup (using elements/attributes allowed in HTML 4.01) should not cause JS to execute in one browser but not in another. -- Geoffrey Sneddon http://gsnedders.com/