liorean wrote:
On 11/01/06, Kat <[EMAIL PROTECTED]> wrote:
Is it safe to use the named references that formerly refered to the control 
characters?

Yes, it's safe to use the named entity references in HTML4, but it's easier to just use UTF-8 and type the actual characters instead. &mdash; (or any other entity reference) has never referred to a control character, you're getting confused by the fact that IE (and now every other HTML browser, for compatibility) incorrectly interprets character references from &#128; to &#159; (and their hex equivalents) as though the Document Character Set were Windows-1252. This has never been defined in any standard, it is nothing more than widely implemented broken behaviour.

Multi level answer here:
- text/html: Should be perfectly safe.

Yes, it only depends on the availability of fonts and support for the characters used. Not all characters are supported by every browser. For example, the character referred to by &shy; (soft-hyphen) isn't supported by Mozilla yet. Also, some older and obsolete browsers don't support all named entities.

- application/xhtml+xml: Should be, but isn't, safe except for the
five named entities of XML. Use decimal or hexadecimal character
references instead.
- application/xml: Only safe in validating user agents. Which doesn't
include browsers. So, use decimal or hexadecimal character references.

There is no difference between the handling of the MIME types, both require the use of a validating parser to handle named entity references. The exception to the rule is that some browsers, such as Mozilla, despite not implementing a validating parser, may have a pseudo-DTD catalog containing just these entity references. Mozilla uses this catalog when it encounters an XHTML DOCTYPE in an XML document, regardless of the MIME type. (It works similarly for MathML too).

Character references refer to Unicode code points independent of the
document encoding and character set. At least for HTML4 and XML, if
not for HTML3.2.

As far as character references in HTML are concerned, they have always referred to the Unicode code points since HTML 2.0.

See my article:
http://lachy.id.au/log/2005/10/char-refs
(take note of the comments too, which contain a few corrections)

--
Lachlan Hunt
http://lachy.id.au/

******************************************************
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
******************************************************

Reply via email to