The spec for document.write() http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#dom-document-write says: "... have the tokenizer process the characters that were inserted, one at a time, processing resulting tokens as they are emitted, and stopping when the tokenizer reaches the insertion point..."

But what happens if the last character written by document.write() is a carriage return?

The HTML parsing spec says that CR followed by LF is ignored but CR followed by anything else is converted to LF. So if the last character is CR, then the tokenizer can't process all characters up to the insertion point because it needs to lookahead at the next character, right?

Firefox, Chrome and Safari all seem to do the right thing: wait for the next character before tokenizing the CR. And I think this means that the description of document.write needs to be changed. (Opera, on the other hand, just gets this wrong and emits a CR character).

Similarly, what should the tokenizer do if the document.write emits half of a UTF-16 surrogate pair as the last character?

    David

Reply via email to