On 12 October 2017 at 15:19, Sam Whited <s...@samwhited.com> wrote: > On Thu, Oct 12, 2017, at 03:09, Dave Cridland wrote: >> I would note that in principle, a content security policy ought to >> prevent such attacks outright. >> >> But there would, probably, remain several other innovative attacks, >> such as passing client-specific markup intended to duplicate existing >> UI elements. > > Indeed. Using a restricted subset of a complicated system always > introduces the risk that some part of that complexity will not be > understood and will leak out, possibly causing security issues. We see > that on the web fairly regularly. > > It's my beleif that it's always better to use a simple, complete system > instead of a restricted, complex system. We see the same thing with > XMPP's use of XML: we may use a sane subset of it, but since the > underlying libraries still handle things like proc insts and whatever > the ampersand escape thing is called you still get attacks based on > those every so often (even though they're forbidden in XMPP). > > I didn't bring this up in the original mail because it tends to get a > bit abstract, but it's worth discussing if we move to make a > replacement. >
I think the problem isn't simply a subset of a complex system, it's that sanitizing HTML is a difficult and largely error prone problem which has repeatedly been the cause of a number of security problems. I appreciate it's entirely possible, but even a simplified ruleset is something like: 1) For each child element: a) Discard if this is an unsupported element. b) Remove any unsupported attributes. c) For the style attribute, parse the CSS and: ii) remove any unsupported attributes. i) For attributes which (might) contain a URL, ensure the URL is of a scheme we think might be OK, although we won't tell you which those are. d) For each remaining HTML attribute which (might) contain a URL, ensure that any URL is of a scheme we think be be OK, although we won't tell you which those are. e) Recurse for each child element. >> So overall, I think we should move rich IM formatting to Markdown and >> call it done. > > Let's discuss this in a separate thread. I'd really like to try and keep > this about deprecating XHTML-IM, which I think is an orthogonal track of > work (unless you disagree, in which case, please voice that here!). It's clearly not orthogonal, since simply getting rid of XHTML-IM is not deprecating it in favour of anything else. But several clients have supported a basic Markdown-like syntax for emphasis for years - Gajim, for example, supports both *bold* and /italic/ at a quick test, and I think it has for years. Slack does fine on just a handful more items (`preformat`, for example). I appreciate Goffi's argument that Markdown-like syntaxes do not handle tables, but guess what? Nor does XHTML-IM. So my argument for keeping it in this thread is really in order to understand what features of XHTML-IM are desirable rather than to fully specify a replacement - once we know that we want XHTML-IM's feature set to support bold, or tables, or inline images, or whatever then we can move on to design a replacement. Dave. _______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org _______________________________________________