Re: [whatwg] Another issue in 12.2.5.5 parsing tokens in foreign content

2013-07-31 Thread Ian Hickson
On Thu, 4 Jul 2013, Michael Day wrote:
  
  The problem is that we can't do (2) in _all_ cases, e.g. innerHTML on 
  an svg can't possibly break out of the svg if it sees one of these 
  tags, since that's the root of what is being parsed.
 
 Yes, HTML has already lost the composability of parsing that XML and 
 other languages have, that's long gone. But that doesn't mean we should 
 try to make it even more irregular :)
 
 Currently Firefox, Chrome, and Prince all treat the fragment case the 
 same as the whole document case, so we already have interoperable 
 behaviour on this issue.

If you treated them the same, you would either crash or have an infinite 
loop, because you'd either pop the root element off the stack and then try 
to append something to null, or you'd try to reprocess the token without 
having popped anything first.

There has to be _some_ special casing of svg.innerHTML.

What should the special casing be? Consider this case:

   svg.innerHTML = 'gp'

I can see two possible options:

svg
|
+-- g
|
+-- P

Or:

svg
|
+-- g
|
+-- P

Neither are what happens in the non-fragment case (in that case the p is 
a sibling of the svg).

Consider this case:

   svg.innerHTML = 'gsvggp'

Here, the P node could be a child of the innermost g, the innermost 
svg, the outermost g, or the outermost svg. I could see arguments 
for all those cases. It seems unlikely that the author meant any of them.
 

 Since the HTML spec is supposed to reflect reality, it seems pointless 
 to deliberately introduce an inconsistency in the parsing model that 
 requires changes in all user agents to implement.

All the user agents (or at least, all the browsers I could test) have to 
change anyway. Blink-based browsers and WebKit-based browsers don't 
support innerHTML on svg at all. Firefox supports innerHTML on svg but 
puts all the nodes in the HTML namespace.

In conclusion, the reason I simply removed the quirk from fragment parsing 
rather than trying to make it work is that:

 - all browsers will have to change anyway,

 - the quirk needs special handling in the fragment case anyway,

 - it's not clear what the behaviour should be,

 - in many cases, we're not error-correcting in a useful way anyway.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Another issue in 12.2.5.5 parsing tokens in foreign content

2013-07-03 Thread Michael Day

Hi Ian,


We don't have any data that says that we need to support this for
innerHTML. I think it's a win if we can drop the hack from innerHTML.


Okay, so allowing some HTML elements to break out of foreign content is 
a hack added for historical reasons, that will surprise authors and 
complicate implementations and is thus regrettable, but necessary.


Then there are two possibilities for fragment parsing:

(1) The hack can be left out of fragment parsing, as there is no 
historical justification for it. Since the hack is bad, removing it from 
as many situations as possible is good.


(2) The hack can apply to fragment parsing in the same way as it applies 
to regular parsing. This makes parsing behaviour more consistent across 
different situations, which is good.


I'm strongly in favour of (2), as it seems that omitting the hack from 
some rare situations doesn't save authors any trouble, and doesn't 
follow the principle of least surprise.


In an ideal world it would be possible to grab any subsection of a 
document, parse that in isolation as a fragment, and get the same result 
as if it was parsed in its original document context. This is possible 
in XML, but not HTML, due to the existing author-friendly hacks, and 
making the parsing behaviour even more context sensitive doesn't seem 
like a good thing.


Best regards,

Michael

--
Prince: Print with CSS!
http://www.princexml.com


Re: [whatwg] Another issue in 12.2.5.5 parsing tokens in foreign content

2013-07-03 Thread Ian Hickson
On Thu, 4 Jul 2013, Michael Day wrote:
  
  We don't have any data that says that we need to support this for 
  innerHTML. I think it's a win if we can drop the hack from innerHTML.
 
 Okay, so allowing some HTML elements to break out of foreign content is 
 a hack added for historical reasons, that will surprise authors and 
 complicate implementations and is thus regrettable, but necessary.
 
 Then there are two possibilities for fragment parsing:
 
 (1) The hack can be left out of fragment parsing, as there is no 
 historical justification for it. Since the hack is bad, removing it from 
 as many situations as possible is good.
 
 (2) The hack can apply to fragment parsing in the same way as it applies 
 to regular parsing. This makes parsing behaviour more consistent across 
 different situations, which is good.
 
 I'm strongly in favour of (2), as it seems that omitting the hack from 
 some rare situations doesn't save authors any trouble, and doesn't 
 follow the principle of least surprise.

The problem is that we can't do (2) in _all_ cases, e.g. innerHTML on an 
svg can't possibly break out of the svg if it sees one of these tags, 
since that's the root of what is being parsed.

Given that, it's not clear that (2) is better than (1). (I agree that if 
we could actually always be consistent, it would be.)

Note that this isn't the only place like that.

   table
div
   /table

...and:

   document.createElement('table').innerHTML = 'div';

...result in very different DOMs (in the first, the div and the 
table are siblings; in the latter, the div is a child).


 In an ideal world it would be possible to grab any subsection of a 
 document, parse that in isolation as a fragment, and get the same result 
 as if it was parsed in its original document context. This is possible 
 in XML, but not HTML, due to the existing author-friendly hacks, and 
 making the parsing behaviour even more context sensitive doesn't seem 
 like a good thing.

I think we're _so_ far beyond this ideal world that I'm not sure it's 
worth even looking for it, to be honest. :-)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Another issue in 12.2.5.5 parsing tokens in foreign content

2013-07-03 Thread Michael Day

Hi Ian,


The problem is that we can't do (2) in _all_ cases, e.g. innerHTML on an
svg can't possibly break out of the svg if it sees one of these tags,
since that's the root of what is being parsed.


Yes, HTML has already lost the composability of parsing that XML and 
other languages have, that's long gone. But that doesn't mean we should 
try to make it even more irregular :)


Currently Firefox, Chrome, and Prince all treat the fragment case the 
same as the whole document case, so we already have interoperable 
behaviour on this issue.


Since the HTML spec is supposed to reflect reality, it seems pointless 
to deliberately introduce an inconsistency in the parsing model that 
requires changes in all user agents to implement.


Best regards,

Michael

--
Prince: Print with CSS!
http://www.princexml.com


Re: [whatwg] Another issue in 12.2.5.5 parsing tokens in foreign content

2013-07-02 Thread Michael Day

Hi Ian,


I ended up removing this from the spec for other reasons, so this should
be resolved now. Let me know if it's not.

(No, I don't know what I had originally intended.)


I don't think the new spec is correct. The question is what happens if 
we are tokenizing some foreign content, and we see an HTML start tag.


In the normal case, we pop off all the foreign elements until we get 
back to the HTML namespace, then reprocess the token.


In the fragment case, the context element may be a foreign element, so 
there was the wrinkle of having to handle that appropriately when we 
have this fake root html element that makes everything confusing.


The new text reads:

If the parser was originally created for the HTML fragment parsing 
algorithm, then act as described in the any other start tag entry 
below. (fragment case)


This always just adds the HTML element in place inside the foreign 
content, even if the fragment context element *is* a HTML element!


This can't be right, as it means parsing document.body.innerHTML will 
behave totally differently to parsing htmlbody, for no reason.


Looking back a couple of years, this section of the spec seems to be 
drifting in a random walk away from reality. We can study this further 
and try suggesting some text based on what we have implemented so far.


Best regards,

Michael

--
Prince: Print with CSS!
http://www.princexml.com


Re: [whatwg] Another issue in 12.2.5.5 parsing tokens in foreign content

2013-07-02 Thread Ian Hickson
On Tue, 2 Jul 2013, Michael Day wrote:
 
 The new text reads:
 
 If the parser was originally created for the HTML fragment parsing algorithm,
 then act as described in the any other start tag entry below. (fragment
 case)
 
 This always just adds the HTML element in place inside the foreign content,
 even if the fragment context element *is* a HTML element!

Right, that's the intent.

This specific clause is a hack to make certain elements break out of 
foreign content, because we found some pages that do crazy stuff like:

   pBla bla
   svg
   pBla bla

...which, prior to SVG being added to HTML, would show two paragraphs, but 
if we didn't have this hack, it would now just end the page at the svg tag.


 This can't be right, as it means parsing document.body.innerHTML will 
 behave totally differently to parsing htmlbody, for no reason.

Not totally differently, only differently in the specific cases of these 
few tags that trigger this wacked behaviour in markup that's broken anyway.

We don't have any data that says that we need to support this for 
innerHTML. I think it's a win if we can drop the hack from innerHTML.


 Looking back a couple of years, this section of the spec seems to be 
 drifting in a random walk away from reality. We can study this further 
 and try suggesting some text based on what we have implemented so far.

Well, when it started it wasn't reality at all, since there was no foreign 
content support in text/html. :-)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Another issue in 12.2.5.5 parsing tokens in foreign content

2013-07-01 Thread Ian Hickson
On Thu, 18 Apr 2013, Michael Day wrote:
 
 Another issue regarding recent changes to 12.2.5.5 The rules for 
 parsing tokens in foreign content.
 
 When a HTML start tag is seen (specifically b, big, blockquote, 
 body, br, center, code, ...) the following procedure is given to 
 recover from the parse error:
 
 
 If the stack of open elements does not have an element in scope that is a
 MathML text integration point, an HTML integration point, or an element in the
 HTML namespace, or if the stack of open elements has only one element, then
 process the token using the rules for the in body insertion mode. (fragment
 case)
 
 
 Since the stack of open elements always has html at the top of the 
 stack, the element in scope algorithm will always find it, and as a 
 result, the first part of the condition will always fail.

I ended up removing this from the spec for other reasons, so this should 
be resolved now. Let me know if it's not.

(No, I don't know what I had originally intended.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Another issue in 12.2.5.5 parsing tokens in foreign content

2013-06-23 Thread Michael Day

Hi Adam,


Since the stack of open elements always has html at the top of the stack,
the element in scope algorithm will always find it, and as a result, the
first part of the condition will always fail.


Even in the fragment case?  (Note the parenthetical remark in the spec
about this text applying only in the fragment case.)


Yes, see 12.4, the stack of open elements always contains a html root 
in the fragment case when there is a context element:


Let root be a new html element with no attributes.
...
Set up the parser's stack of open elements so that it contains just
the single element root.

Best regards,

Michael

--
Prince: Print with CSS!
http://www.princexml.com


Re: [whatwg] Another issue in 12.2.5.5 parsing tokens in foreign content

2013-06-22 Thread Adam Barth
On Thu, Apr 18, 2013 at 12:27 AM, Michael Day mike...@yeslogic.com wrote:
 Another issue regarding recent changes to 12.2.5.5 The rules for parsing
 tokens in foreign content.

 When a HTML start tag is seen (specifically b, big, blockquote,
 body, br, center, code, ...) the following procedure is given to
 recover from the parse error:

 
 If the stack of open elements does not have an element in scope that is a
 MathML text integration point, an HTML integration point, or an element in
 the HTML namespace, or if the stack of open elements has only one element,
 then process the token using the rules for the in body insertion mode.
 (fragment case)
 

 Since the stack of open elements always has html at the top of the stack,
 the element in scope algorithm will always find it, and as a result, the
 first part of the condition will always fail.

Even in the fragment case?  (Note the parenthetical remark in the spec
about this text applying only in the fragment case.)

Adam


 This seems unintentional, and depends upon the exact way in which the
 element in scope algorithm is defined.

 Perhaps rewriting this paragraph without reference to the element in scope
 algorithm would make the intent clearer? For example:

 If the stack of open elements does not any elements that are MathML text
 integration points, or HTML integration points, or that are in the HTML
 namespace, or if the stack of open elements has only one element ...

 Any thoughts?

 Best regards,

 Michael