The HTML5 draft says that authors should not use EBCDIC-based encodings. This is more lax than saying that authors must not use and user agents must not support CESU-8, UTF-7, BOCU-1 and SCSU.

In general, now that UTF-8 exists and is ubiquitously supported, proliferation of encodings is costly and doesn't expand that the expressiveness of HTML which is parsed into a Unicode DOM anyway. Moreover, encodings that are not ASCII supersets are potential security risks since the string "<script>" may be represented by different bytes than in ASCII leading to potential privilege escalation if a server-side gatekeeper and a user agent give different meanings to the bytes.

For these reasons, if EBCDIC-based encodings don't need to be supported in order to Support Existing Content, it would be beneficial never to add support for them and, thus, ban them like CESU-8, UTF-7, BOCU-1 and SCSU.

I asked Hixie for examples of sites or browsers that require/support EBCDIC-based encodings. He had none. I examined the encoding menus of Firefox 3b5, Safari 3.1 and Opera 9.5 beta (on Leopard) and IE8 beta 1 (on English XP SP3). None of them expose EBCDIC-based encodings in the UI. (All the IBM encodings Firefox exposes turn out to be ASCII-based.)

This makes me wonder: Do the top browsers support any EBCDIC-based encodings but just without exposing them in the UI? If not, can there be any notable EBCDIC-based Web content?

I'm suspecting that EBCDIC isn't actually a Web-relevant.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/


Reply via email to