Eric, Forgive my density. I¹m not sure that I understand. Are you arguing that an ASCII encoding scheme (ISO-8859-1) is not a limitation because, semantically, all of the characters (a, b, c, etc.) also exist in the Unicode scheme?
It makes sense to me that ASCII is not a limitation for those documents that are limited to that character set. But, your own message, ³which contains U+10DB ? GEORGIAN LETTER MAN and U+092E Ã DEVANAGARI LETTER MA² triggers an error message in my own email client (Entourage X), namely: ³Some text in this message is in a langauge that your computer cannot display.² I¹m not certain if I¹m seeing this because I don¹t possess a font to display those characters or some other reason. I suspect that this is the reason because, when I try to look up those character's in OS X's Character Palette, the Georgian and Devongari Unicode blocks show up blank. The observation that I, the ³Irish (American) colleague,² made to Michael was that there is a sentence in the NYT article displayed in my browser that dropped the OOE7 LATIN SMALL LETTER C WITH CEDILLA (e.g., François). There's nothing in the paragraph in question to indicate that there is a missing character--nor is there a numeric code displayed for a savvy user to look up. Surely in this context, we would agree that the semantic content was distorted, yes? Sincerely, Brian Doyle Unicode newbie On 9/25/03 11:54 AM, "Eric Muller" <[EMAIL PROTECTED]> wrote: > > > Michael Everson wrote: >> An Irish colleague here said he liked the article but noted that the Times' >> web directors don't use Unicode.... >> >> >>> ... >>> <meta http-equiv="charset" content="iso-8859-1"> >>> ... >>> >>> > There is an alternative point of view, which says that charset declared in an > HTML (or XML) document is no more than an encoding scheme, and that all > characters in those documents are fundamentally Unicode characters (i.e. they > start in life with the full semantic of Unicode, they don't inherit it on the > occasion of character set conversion). That view is supported by the XML spec > itself, and by the infoset definition. And because we have numeric character > entities, using an iso-8859-1 encoding scheme is not really a limitation: > witness this message, which contains U+10DB ? GEORGIAN LETTER MAN and U+092E Ã > DEVANAGARI LETTER MA. > > Eric. > >