liorean wrote:
On 11/01/06, Kat <[EMAIL PROTECTED]> wrote:
Is it safe to use the named references that formerly refered to the control
characters?
Yes, it's safe to use the named entity references in HTML4, but it's
easier to just use UTF-8 and type the actual characters instead.
— (or any other entity reference) has never referred to a control
character, you're getting confused by the fact that IE (and now every
other HTML browser, for compatibility) incorrectly interprets character
references from € to Ÿ (and their hex equivalents) as though
the Document Character Set were Windows-1252. This has never been
defined in any standard, it is nothing more than widely implemented
broken behaviour.
Multi level answer here:
- text/html: Should be perfectly safe.
Yes, it only depends on the availability of fonts and support for the
characters used. Not all characters are supported by every browser.
For example, the character referred to by ­ (soft-hyphen) isn't
supported by Mozilla yet. Also, some older and obsolete browsers don't
support all named entities.
- application/xhtml+xml: Should be, but isn't, safe except for the
five named entities of XML. Use decimal or hexadecimal character
references instead.
- application/xml: Only safe in validating user agents. Which doesn't
include browsers. So, use decimal or hexadecimal character references.
There is no difference between the handling of the MIME types, both
require the use of a validating parser to handle named entity
references. The exception to the rule is that some browsers, such as
Mozilla, despite not implementing a validating parser, may have a
pseudo-DTD catalog containing just these entity references. Mozilla
uses this catalog when it encounters an XHTML DOCTYPE in an XML
document, regardless of the MIME type. (It works similarly for MathML too).
Character references refer to Unicode code points independent of the
document encoding and character set. At least for HTML4 and XML, if
not for HTML3.2.
As far as character references in HTML are concerned, they have always
referred to the Unicode code points since HTML 2.0.
See my article:
http://lachy.id.au/log/2005/10/char-refs
(take note of the comments too, which contain a few corrections)
--
Lachlan Hunt
http://lachy.id.au/
******************************************************
The discussion list for http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
******************************************************