Re: [WSG] HTML Numeric and Named Entities

Lachlan Hunt Tue, 10 Jan 2006 17:23:31 -0800

liorean wrote:

On 11/01/06, Kat <[EMAIL PROTECTED]> wrote:

Is it safe to use the named references that formerly refered to the control 
characters?

Yes, it's safe to use the named entity references in HTML4, but it'seasier to just use UTF-8 and type the actual characters instead.— (or any other entity reference) has never referred to a controlcharacter, you're getting confused by the fact that IE (and now everyother HTML browser, for compatibility) incorrectly interprets characterreferences from  to  (and their hex equivalents) as thoughthe Document Character Set were Windows-1252. This has never beendefined in any standard, it is nothing more than widely implementedbroken behaviour.

Multi level answer here:
- text/html: Should be perfectly safe.

Yes, it only depends on the availability of fonts and support for thecharacters used. Not all characters are supported by every browser.For example, the character referred to by  (soft-hyphen) isn'tsupported by Mozilla yet. Also, some older and obsolete browsers don'tsupport all named entities.

- application/xhtml+xml: Should be, but isn't, safe except for the
five named entities of XML. Use decimal or hexadecimal character
references instead.
- application/xml: Only safe in validating user agents. Which doesn't
include browsers. So, use decimal or hexadecimal character references.

There is no difference between the handling of the MIME types, bothrequire the use of a validating parser to handle named entityreferences. The exception to the rule is that some browsers, such asMozilla, despite not implementing a validating parser, may have apseudo-DTD catalog containing just these entity references. Mozillauses this catalog when it encounters an XHTML DOCTYPE in an XMLdocument, regardless of the MIME type. (It works similarly for MathML too).

Character references refer to Unicode code points independent of the
document encoding and character set. At least for HTML4 and XML, if
not for HTML3.2.

As far as character references in HTML are concerned, they have alwaysreferred to the Unicode code points since HTML 2.0.


See my article:
http://lachy.id.au/log/2005/10/char-refs
(take note of the comments too, which contain a few corrections)

--
Lachlan Hunt
http://lachy.id.au/

******************************************************
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
******************************************************

Re: [WSG] HTML Numeric and Named Entities

Reply via email to