On 2/9/10 11:56 PM, Tab Atkins Jr. wrote:
On Tue, Feb 9, 2010 at 9:05 PM, Biju<bijumaill...@gmail.com>  wrote:
What should a user agent display when html content is...

<html><body>
<%@ page language="java" %>
</body></html>

At present IE and Safari display blank

Firefox display<%@ page language="java" %>

As does Opera, and Firefox with the HTML5 parser enabled.

But for
<html><body>
abc<? echo ">"  ?>  xyz
</body></html>

Firefox display...
abc " ?>  xyz

As does Opera, and Firefox with the HTML5 parser enabled.

Can someone else with more familiarity with the parser algorithm help
out here?

For the "<%@" case, it looks like the state machine will go through the following states:

  Data state -> Tag open state

[1]. When encountering a '%' in the "Tag open" state, the specification says:

    Parse error. Emit a U+003C LESS-THAN SIGN character token
    and reconsume the current input character in the data state.[2]

So the state will then remain "Data state" until the next '&' or '<' or EOF is seen, so the entire string up to the </body> will be treated as literal text.

For the "<?" case, the state transitions will be:

  Data state -> Tag open state -> Bogus comment state

[1],[2].  Then the specification says to:

  Consume every character up to and including the first U+003E
  GREATER-THAN SIGN character (>) or the end of the file (EOF),
  whichever comes first. Emit a comment token whose data is the
  concatenation of all the characters starting from and including
  the character that caused the state machine to switch into the bogus
  comment state, up to and including the character immediately before
  the last consumed character (i.e. up to the character just before the
  U+003E or EOF character). (If the comment was started by the end of
  the file (EOF), the token is empty.)

  Switch to the data state. [3]

Or in other words, stop the bogus comment at the first '>' you see and then start parsing normally again. In this case, that means treating everything up to the next '<' or '&' or EOF as literal text.

So the currently-specified behavior in fact matches the observed Firefox behavior (with either parser) on these simple testcases.

-Boris

[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#data-state [2] http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tag-open-state [3] http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#bogus-comment-state

Reply via email to