Re: [whatwg] Internal character encoding declaration

Henri Sivonen Sat, 11 Mar 2006 14:49:09 -0800

On Mar 11, 2006, at 17:10, Henri Sivonen wrote:

Initialize a character decoder that the bytes 0x20–0x7E (inclusive)as well as 0x09, 0x0A and 0x0D decode to the Unicode code points ofthe same (zero-extended) value and maps all other bytes to U+FFFDand raises a REWIND flag

On further reflection, it occurred to me that emitting theWindows-1252 characters instead of U+FFFD would be a goodoptimization for the common case where the encoding later turns outto be Windows-1252 or ISO-8859-1. This would require more that onebookkeeping flag, though.

If a start tag other than html or head is seen, emit an easy parseerror.


Same with character data.

Encoding errors are easy parse errors. (Emit U+FFFD on bogus data.)

Except for the ISO-8859-* family the easy error recovery should beemitting the characters according to the corresponding Windows-*family superset.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

Re: [whatwg] Internal character encoding declaration

Reply via email to