On Thu, 16 Feb 2006, Tim Altman wrote: > > OK. Assuming the HTML5 document is served with a text/html doctype, how would > the following markup be parsed? > > <table> > <tr> > <td> > <canvas/> > <p>Foo</p> > </td> > </tr> > </table>
You omitted the DOCTYPE, which makes it a "difficult parse error" and thus isn't currently defined (i.e. it triggers Quirks mode). Assuming the document started with "<!DOCTYPE HTML>", though, and ignoring all whitespace (nothing interesting happens with whitespace): * Tree Construction starts in the Initial Phase. * A DOCTYPE token marked as being correct -- Append a DocumentType node to the Document node -- Switch to the Root Element Phase. * A start tag token (<table>) -- Append an <html> element to the Document node. -- Switch to the Main Phase * Main phase state: -- Insertion mode is in the "before head" mode. -- Stack of open elements has just <html>. -- Reprocess the token: * "Anything else" (<table>) in "before head" -- Act as if <head> had been seen: * A start tag token with the tag name "head" -- Append a <head> element to the <html> element. -- Stack of open elements has <html><head>. -- Switch to the "in head" insertion mode. -- Reprocess the token: * "Anything else" (<table>) in "in head" -- Act as if </head> had been seen: * An end tag token with the tag name "head" -- Stack of open elements again just has <html>. -- Change the insertion mode to "after head". -- Reprocess the token: * "Anything else" (<table>) in "after head" -- Act as if <body> had been seen: * A start tag token with the tag name "body" -- Append a <body> element to the <html> element. -- Stack of open elements has <html><body>. -- Switch to the "in body" insertion mode. -- Reprocess the token: * A start tag whose tag name is "table" -- Append a <table> element to the <body> element. -- Stack of open elements has <html><body><table>. -- Switch to the "in table" insertion mode. * A start tag whose tag name is one of: "td", "th", "tr" -- Act as if <tbody> had been seen: * A start tag whose tag name is one of: "tbody", "tfoot", "thead" -- Append a <tbody> to the <table> element. -- Stack of open elements has <html><body><table><tbody>. -- Switch to the "in table body" insertion mode. -- Reprocess the token: * A start tag whose tag name is "tr" -- Append a <tr> element to the <tbody> element. -- Stack of open elements has <html><body><table><tbody><tr>. -- Switch to the "in row" insertion mode. * A start tag whose tag name is one of: "th", "td" -- Append a <td> element to the <tr> element. -- Stack of open elements has <html><body><table><tbody><tr><td>. -- Switch to the "in cell" insertion mode. * Anything else (<canvas>) in "in cell" -- Process as if it was "in body": * A start tag token not covered by the previous entries (<canvas>) -- Append a <canvas> element to the <td> element. -- Stack of open elements has: <html><body><table><tbody><tr><td><canvas> * Anything else (<p>) in "in cell" -- Process as if it was "in body": * A start tag whose tag name is one of: "address", "blockquote", "center", "dir", "div", "dl", "fieldset", "h1", "h2", "h3", "h4", "h5", "h6", "listing", "menu", "ol", "p", "pre", "ul" -- Append a <p> element to the <canvas> element. -- Stack of open elements has: <html><body><table><tbody><tr><td><canvas><p> * Anything else (character "F", then later "o" and "o") in "in cell" -- Process as if it was "in body": * Append a text node Foo to the <p> element. * Anything else (</p>) in "in cell" -- Process as if it was "in body": * An end tag whose tag name is "p" -- Stack of open elements one again has just: <html><body><table><tbody><tr><td><canvas> -- Insertion mode is still "in cell". * An end tag whose tag name is one of: "td", "th" -- Current node is not a <td> (it's <canvas>): EASY PARSE ERROR. -- Pop elements until a <td> is popped. Stack of open elements one again has just <html><body><table><tbody><tr>. -- Switch insertion mode to "in row". * An end tag whose tag name is "tr" -- Stack of open elements is now: <html><body><table><tbody>. -- Switch insertion mode to "in table body". * An end tag whose tag name is "table" -- Act as if </tbody> had been seen: * An end tag whose tag name is one of: "tbody", "tfoot", "thead" -- Stack of open elements is <html><body><table>. -- Change insertion mode to "in table". -- Reprocess the token. * An end tag whose tag name is "table" -- Stack of open element is <html><body>. -- Change insertion mode to "in body". * An end-of-file token -- Act as if </body> had been seen: * An end tag with the tag name "body" -- Switch insertion mode to "after body". -- Reprocess the token. * An end-of-file token -- Act as if </html> had been seen: * An end tag with the tag name "html" -- Switch to the Trailing End Phase. -- Reprocess the token. * An end-of-file token -- Ignore the token. The result is a DOM that looks like: #document HTML HEAD BODY TABLE TBODY TR TD CANVAS P #text ("Foo") Hopefully everyone was able to follow along at home and get the same result. > I skimmed the parsing section of the current HTML5 draft (mainly > 8.2.2.3.7) and noticed that the canvas element is being treated as a > "phrasing" element. Is this by mistake? I would think it would be > treated similar to the object element, since they have similar handling > of fallback content. New elements will all be either treated like <div>, <input>, or <span>, depending on whether they are structure-like, empty, or something else. <object> has _complicated_ parsing semantics. We don't want to make any new elements have complicated parsing semantics (especially because that wouldn't be backwards-compatible). -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'