Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-16 Thread Ian Hickson
On Tue, 16 Dec 2008, Edward Z. Yang wrote: > Ian Hickson wrote: > > Mostly, yes. (There are exceptions, but they're not things you'd really > > want to be using anyway, e.g. obscure SGML features.) > > Are these exceptions, by any chance, documented somewhere? http://wiki.whatwg.org/wiki/Diff

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-16 Thread Edward Z. Yang
Ian Hickson wrote: > Mostly, yes. (There are exceptions, but they're not things you'd really > want to be using anyway, e.g. obscure SGML features.) Are these exceptions, by any chance, documented somewhere? Cheers, Edward

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-16 Thread Charles McCathieNevile
On Mon, 15 Dec 2008 22:36:24 +0100, Martin Atkins wrote: Edward Z. Yang wrote: Ian Hickson wrote: I'm not saying don't be standards-compliant; I'm just saying use a subset of HTML5 that you feel comfortable with (which might also be a subset of HTML4, for that matter, just with the HTML5

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-16 Thread Iñigo
> Edward Z. Yang: > > > Sounds good, since HTML4 is a strict subset of HTML5 (correct me if I'm > > > wrong?) > > Ian Hickson: > > Mostly, yes. (There are exceptions, but they're not things you'd really > > want to be using anyway, e.g. obscure SGML features.) > > Note though that it's not possible

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread Cameron McCormack
Edward Z. Yang: > > Sounds good, since HTML4 is a strict subset of HTML5 (correct me if I'm > > wrong?) Ian Hickson: > Mostly, yes. (There are exceptions, but they're not things you'd really > want to be using anyway, e.g. obscure SGML features.) Note though that it’s not possible to write a do

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread Ian Hickson
On Mon, 15 Dec 2008, Edward Z. Yang wrote: > Ian Hickson wrote: > > I'm not saying don't be standards-compliant; I'm just saying use a subset > > of HTML5 that you feel comfortable with (which might also be a subset of > > HTML4, for that matter, just with the HTML5 DOCTYPE so that you don't have

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread Tab Atkins Jr.
On Mon, Dec 15, 2008 at 3:32 PM, Edward Z. Yang wrote: > Ian Hickson wrote: >> I'm not saying don't be standards-compliant; I'm just saying use a subset >> of HTML5 that you feel comfortable with (which might also be a subset of >> HTML4, for that matter, just with the HTML5 DOCTYPE so that you do

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread Martin Atkins
Edward Z. Yang wrote: Ian Hickson wrote: I'm not saying don't be standards-compliant; I'm just saying use a subset of HTML5 that you feel comfortable with (which might also be a subset of HTML4, for that matter, just with the HTML5 DOCTYPE so that you don't have to worry about exactly which ve

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread Edward Z. Yang
Ian Hickson wrote: > I'm not saying don't be standards-compliant; I'm just saying use a subset > of HTML5 that you feel comfortable with (which might also be a subset of > HTML4, for that matter, just with the HTML5 DOCTYPE so that you don't have > to worry about exactly which version you want t

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread Ian Hickson
On Mon, 15 Dec 2008, Edward Z. Yang wrote: > > > I wouldn't really worry about "4" vs "5". What matters is what works > > in browsers, or whatever tools your users are using. (This is one > > reason in HTML5 we do away with having the version number in the > > DOCTYPE.) I'd recommend just using

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread Edward Z. Yang
Ian Hickson wrote: > Oh well that's just a matter of having pluggable modules for different > things to filter. You can equally support SVG and MathML in this way. You > just need the core processing to be made independent of the filtering. I just realized an error in my thought that I would nee

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread Ian Hickson
On Mon, 15 Dec 2008, Edward Z. Yang wrote: > > In theory, I could write separate sanitizers for HTML 4, XHTML 1.0, > XHTML 2.0, HTML 5, etc. In practice, I want to reuse as much code as > possible between these cases, since I'm a lazy developer. Perhaps > "extensibility" is not the right word h

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread Edward Z. Yang
Ian Hickson wrote: > I don't really see why a sanitiser needs extensibility though. Could you > elaborate on this? Surely you just want to filter anything that isn't > valid or safe, and only leave the valid safe stuff, using a whitelist. In theory, I could write separate sanitizers for HTML 4,

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread Ian Hickson
On Mon, 15 Dec 2008, Edward Z. Yang wrote: > Ian Hickson wrote: > > In general you should be able to just implement what the spec says and > > then either leave the HTML5 support in (it's unlikely to cause any harm) > > or just comment out the support for the new elements, that should be > > rel

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread Edward Z. Yang
James Graham wrote: > Nothing in section 8 is going to ensure that you get output that passes > a conformance check. If you do transform the output into something that > is conforming then you have to make up the rules yourself Yes, which I suppose is slightly concerning. My philosophy is to first

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread Edward Z. Yang
Ian Hickson wrote: > In general you should be able to just implement what the spec says and > then either leave the HTML5 support in (it's unlikely to cause any harm) > or just comment out the support for the new elements, that should be > relatively easy. Right, this is mostly what I intended

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread Edward Z. Yang
Geoffrey Sneddon wrote: > If you do start work on a PHP implementation, please do seriously > consider adding it to the html5lib project (which currently contains > Python and Ruby implementations) as MIT licensed — there are also a fair > number of test cases there. I'd be quite interested in reu

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-15 Thread James Graham
Edward Z. Yang wrote: The reason I'd like to know this is because I am the author of a tool named HTML Purifier, which takes user-input HTML and cleans it for standards-compliance as well as XSS. We insist on output being standards compliant, because the result is unambiguous. Nothing in sec

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-14 Thread Ian Hickson
On Sun, 14 Dec 2008, Edward Z. Yang wrote: > > I was curious to know how stable/complete HTML 5's tokenizing and DOM > algorithms are (specifically section 8). Pretty stable. There are some known issues [1], and more issues will surely be found as implementations grow in usage, but the basic a

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-14 Thread Geoffrey Sneddon
On 14 Dec 2008, at 21:55, Edward Z. Yang wrote: Are there any specific differences that pose problems? Not that I know of yet, since I haven't started on an implementation yet. Which brings me back to my original question: how stable is section 8? I would rather not be chasing a moving tar

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-14 Thread Edward Z. Yang
Anne van Kesteren wrote: > Could you explain what is not sufficient about the the "Parsing HTML > fragments" section: I must admit, I had not seen that section! That seems to be quite sufficient. My bad. :o) > Are there any specific differences that pose problems? Not that I know of yet, since I

Re: [whatwg] Stability of tokenizing/dom algorithms

2008-12-14 Thread Anne van Kesteren
On Sun, 14 Dec 2008 22:37:40 +0100, Edward Z. Yang wrote: 1. Users input HTML fragments, not actual HTML documents. A parser I would use needs to be able to enter parsing in a specific state, and has to ignore any requests by the user to exit that state (i.e. a tag) Could you explain what

[whatwg] Stability of tokenizing/dom algorithms

2008-12-14 Thread Edward Z. Yang
Hello all, I was curious to know how stable/complete HTML 5's tokenizing and DOM algorithms are (specifically section 8). A cursory glance through the section reveals a few red warning boxes, but these are largely issues of whether or not the specification should follow browser implementations, an