Re: [whatwg] <% text %> and in corporate intranet html content

Boris Zbarsky Tue, 09 Feb 2010 21:26:08 -0800

On 2/9/10 11:56 PM, Tab Atkins Jr. wrote:

On Tue, Feb 9, 2010 at 9:05 PM, Biju<bijumaill...@gmail.com>  wrote:

What should a user agent display when html content is...


<html><body>
<%@ page language="java" %>
</body></html>

At present IE and Safari display blank

Firefox display<%@ page language="java" %>


As does Opera, and Firefox with the HTML5 parser enabled.

But for
<html><body>
abc<? echo ">"  ?>  xyz
</body></html>

Firefox display...
abc " ?>  xyz


As does Opera, and Firefox with the HTML5 parser enabled.

Can someone else with more familiarity with the parser algorithm help
out here?

For the "<%@" case, it looks like the state machine will go through thefollowing states:


  Data state -> Tag open state

[1]. When encountering a '%' in the "Tag open" state, the specificationsays:


    Parse error. Emit a U+003C LESS-THAN SIGN character token
    and reconsume the current input character in the data state.[2]

So the state will then remain "Data state" until the next '&' or '<' orEOF is seen, so the entire string up to the </body> will be treated asliteral text.


For the "<?" case, the state transitions will be:

  Data state -> Tag open state -> Bogus comment state

[1],[2].  Then the specification says to:

  Consume every character up to and including the first U+003E
  GREATER-THAN SIGN character (>) or the end of the file (EOF),
  whichever comes first. Emit a comment token whose data is the
  concatenation of all the characters starting from and including
  the character that caused the state machine to switch into the bogus
  comment state, up to and including the character immediately before
  the last consumed character (i.e. up to the character just before the
  U+003E or EOF character). (If the comment was started by the end of
  the file (EOF), the token is empty.)

  Switch to the data state. [3]

Or in other words, stop the bogus comment at the first '>' you see andthen start parsing normally again. In this case, that means treatingeverything up to the next '<' or '&' or EOF as literal text.

So the currently-specified behavior in fact matches the observed Firefoxbehavior (with either parser) on these simple testcases.


-Boris

[1]http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#data-state[2]http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tag-open-state[3]http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#bogus-comment-state

Re: [whatwg] <% text %> and in corporate intranet html content

Reply via email to