On Thu, 16 Feb 2006, Tim Altman wrote:
> 
> OK.  Assuming the HTML5 document is served with a text/html doctype, how would
> the following markup be parsed?
> 
> <table>
>   <tr>
>     <td>
>       <canvas/>
>       <p>Foo</p>
>     </td>
>   </tr>
> </table>

You omitted the DOCTYPE, which makes it a "difficult parse error" and thus 
isn't currently defined (i.e. it triggers Quirks mode). Assuming the 
document started with "<!DOCTYPE HTML>", though, and ignoring all 
whitespace (nothing interesting happens with whitespace):

   * Tree Construction starts in the Initial Phase.
   * A DOCTYPE token marked as being correct
      -- Append a DocumentType node to the Document node
      -- Switch to the Root Element Phase.
   * A start tag token (<table>)
      -- Append an <html> element to the Document node.
      -- Switch to the Main Phase
          * Main phase state:
             -- Insertion mode is in the "before head" mode.
             -- Stack of open elements has just <html>.
      -- Reprocess the token:
   * "Anything else" (<table>) in "before head"
      -- Act as if <head> had been seen:
          * A start tag token with the tag name "head"
             -- Append a <head> element to the <html> element.
             -- Stack of open elements has <html><head>.
             -- Switch to the "in head" insertion mode.
      -- Reprocess the token:
   * "Anything else" (<table>) in "in head"
      -- Act as if </head> had been seen:
          * An end tag token with the tag name "head"
             -- Stack of open elements again just has <html>.
             -- Change the insertion mode to "after head".
      -- Reprocess the token:
   * "Anything else" (<table>) in "after head"
      -- Act as if <body> had been seen:
          * A start tag token with the tag name "body"
             -- Append a <body> element to the <html> element.
             -- Stack of open elements has <html><body>.
             -- Switch to the "in body" insertion mode.
      -- Reprocess the token:
   * A start tag whose tag name is "table"
      -- Append a <table> element to the <body> element.
      -- Stack of open elements has <html><body><table>.
      -- Switch to the "in table" insertion mode.
   * A start tag whose tag name is one of: "td", "th", "tr"
      -- Act as if <tbody> had been seen:
          * A start tag whose tag name is one of: "tbody", "tfoot", "thead" 
             -- Append a <tbody> to the <table> element.
             -- Stack of open elements has <html><body><table><tbody>.
             -- Switch to the "in table body" insertion mode.
      -- Reprocess the token:
   * A start tag whose tag name is "tr"
      -- Append a <tr> element to the <tbody> element.
      -- Stack of open elements has <html><body><table><tbody><tr>.
      -- Switch to the "in row" insertion mode.
   * A start tag whose tag name is one of: "th", "td"
      -- Append a <td> element to the <tr> element.
      -- Stack of open elements has <html><body><table><tbody><tr><td>.
      -- Switch to the "in cell" insertion mode.
   * Anything else (<canvas>) in "in cell"
      -- Process as if it was "in body":
          * A start tag token not covered by the previous entries (<canvas>)
             -- Append a <canvas> element to the <td> element.
             -- Stack of open elements has:
                  <html><body><table><tbody><tr><td><canvas>
   * Anything else (<p>) in "in cell"
      -- Process as if it was "in body":
          * A start tag whose tag name is one of: "address", "blockquote", 
            "center", "dir", "div", "dl", "fieldset", "h1", "h2", "h3", 
            "h4", "h5", "h6", "listing", "menu", "ol", "p", "pre", "ul" 
             -- Append a <p> element to the <canvas> element.
             -- Stack of open elements has:
                  <html><body><table><tbody><tr><td><canvas><p>
   * Anything else (character "F", then later "o" and "o") in "in cell"
      -- Process as if it was "in body":
          * Append a text node Foo to the <p> element.
   * Anything else (</p>) in "in cell"
      -- Process as if it was "in body":
          * An end tag whose tag name is "p" 
             -- Stack of open elements one again has just:
                  <html><body><table><tbody><tr><td><canvas>
             -- Insertion mode is still "in cell".
   * An end tag whose tag name is one of: "td", "th"
      -- Current node is not a <td> (it's <canvas>): EASY PARSE ERROR.
      -- Pop elements until a <td> is popped. Stack of open elements one 
         again has just <html><body><table><tbody><tr>.
      -- Switch insertion mode to "in row".
   * An end tag whose tag name is "tr" 
      -- Stack of open elements is now: <html><body><table><tbody>.
      -- Switch insertion mode to "in table body".
   * An end tag whose tag name is "table" 
      -- Act as if </tbody> had been seen:
          * An end tag whose tag name is one of: "tbody", "tfoot", "thead" 
             -- Stack of open elements is <html><body><table>.
             -- Change insertion mode to "in table".
      -- Reprocess the token.
   * An end tag whose tag name is "table" 
      -- Stack of open element is <html><body>.
      -- Change insertion mode to "in body".
   * An end-of-file token 
      -- Act as if </body> had been seen:
          * An end tag with the tag name "body"
             -- Switch insertion mode to "after body".
      -- Reprocess the token.
   * An end-of-file token 
      -- Act as if </html> had been seen:
          * An end tag with the tag name "html"
             -- Switch to the Trailing End Phase.
      -- Reprocess the token.
   * An end-of-file token 
      -- Ignore the token.

The result is a DOM that looks like:

    #document
      HTML
        HEAD
        BODY
          TABLE
            TBODY
              TR
                TD
                  CANVAS
                    P
                      #text ("Foo")

Hopefully everyone was able to follow along at home and get the same 
result.


> I skimmed the parsing section of the current HTML5 draft (mainly 
> 8.2.2.3.7) and noticed that the canvas element is being treated as a 
> "phrasing" element. Is this by mistake?  I would think it would be 
> treated similar to the object element, since they have similar handling 
> of fallback content.

New elements will all be either treated like <div>, <input>, or <span>, 
depending on whether they are structure-like, empty, or something else.

<object> has _complicated_ parsing semantics. We don't want to make any 
new elements have complicated parsing semantics (especially because that 
wouldn't be backwards-compatible).

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Reply via email to