From: "Stefan Persson" <[EMAIL PROTECTED]> > Philippe Verdy wrote: > > Some browsers will need NCRs, some will accept UTF-8, some will need a > > "x-user-defined" encoding which is not a standard encoding for use in conforming > > HTML 3.2... > > Isn't that only the case with non-BMP code points?
I don't know, but for now IE is known to support non-BMP characters only through NCRs, even in UTF-8 documents, AND only if the "x-user-defined" encoding is specified (which is a non standard alias of UTF-8 with special behavior to select a specific user-defined default font instead of using the per-script default font) or with manual selection of the "User-Defined" encoding in the Display menu, and only after patching some registry entries. However, NCRs are sometimes the only way to display non Latin characters in some browsers, as they rely only on the user's locale to get the language and its "prefered" encoding. As far as I know, Unicode is the only alternative to ISCII for Indian Brahmic Scripts, however Urdu written with the Arabic script may be supported with the Arabic ISO8859 charset. UTF-8 is very viable for now for all scripts that can be restricted to the BMP. UTF-8 support out of the BMP is often bous or inexistant in too many browsers... (I did not try, but IE may support the CESU-8 or UTF-16 encoding to get characters out of the BMP, because UTF-16 is the encoding used internally within strings handled with its JavaScript interface). One can test if JavaScript supports UTF-32 strings very simply, by making a character from a non-BMP codepoint value such as 0x10FFFD, and reading the code of the first "character" of that string. One can also test in JavaScript whever UTF-16 is supported by performing the same test with 0xD800 and 0xDC00. Depending on this result, you may then be able to display strings containing non-BMP characters, but still provided that the user manuall selects the "User-Defined" encoding in the IE's display menu or the page is sent with an exp licit header specifuing the charset: "Content-Type: text/html; charset=x-user-defined", but this cannot be performed through JavaScript...