Michael Everson wrote:
An Irish colleague here said he liked the article but noted that the Times' web directors don't use Unicode....

...
<meta  http-equiv="charset" content="iso-8859-1">
...

There is an alternative point of view, which says that charset declared in an HTML (or XML) document is no more than an encoding scheme, and that all characters in those documents are fundamentally Unicode characters (i.e. they start in life with the full semantic of Unicode, they don't inherit it on the occasion of character set conversion). That view is supported by the XML spec itself, and by the infoset definition. And because we have numeric character entities, using an iso-8859-1 encoding scheme is not really a limitation: witness this message, which contains U+10DB მ GEORGIAN LETTER MAN and U+092E म DEVANAGARI LETTER MA.

Eric.

Reply via email to