Lachlan Hunt wrote:
<!DOCTYPE html>
<em><p><span><h1>X</em>Y</span>Z</h1></p>

Mozilla:
BODY
   + EM
   + P
     + SPAN
       + H1
         + EM
           + #text: X
         + #text: YZ

That look reasonably like what the author would want with that rubbish, except that the Z is within the span, but it's not in the markup. If you swap <span> with <strong>, the result is even more perplexing, but the Z is not put within the STRONG element)

I don't like this style because it messes badly with parents and children. It should be clear from the source that CSS selector "em p span h1" should match the string "X". However, with mozilla this isn't the case.

Safari:
BODY
   + EM
     + P
       + SPAN
         + H1
            + #text: X
            + #text: Y
            + #text: Z

In this case, it's all emphasised, instead of just the X like it is in Mozilla. If you swap <span> with <strong>, the result is almost the same, except there is an additional empty STRONG element added as a child of the EM, after the P for no apparent reason.)

Why not just a single text node?

I think a simple way to parse what the author meant is to use just the following rules:

1) An opening tag always starts a new element
2) A matching closing tag closes the element
3) A non-matching closing tag (top of the element stack
   doesn't match with the closing tag) closes all still
   open elements until a match is found. Exceptions for
   this rule:
     3.1) There's no matching element in the stack.
          The closing tag will be ignored.
     3.2) Closing tag is for inline element and closing
          it would require closing a block-level element.
          The closing tag will be ignored.
4) At the end of file, all still open elements are closed.

Unless I made a mistake these rules are usually able to decipher the meaning the author intended. Applying these rules to example
<em><p><span><h1>X</em>Y</span>Z</h1></p>
gives us

EM
+ P
  + SPAN
    + H1
      + #text: XYZ

which is about the same as Safari's interpretation.

As an added bonus, the above simple algorithm doesn't need to look forward for tags to come, so it doesn't prevent incremental rendering.

However, it isn't this easy in real world, because step 1 must support stuff like META, LINK and IMG which have no end tag and never contain other elements. I think the best way is to just close those tags immediately afterwards automatically. If an explicit closing tag is later found, it will be automatically ignored in step 3.

--
Mikko

Reply via email to