Eric,

Forgive my density. I¹m not sure that I understand. Are you arguing that an
ASCII encoding scheme (ISO-8859-1) is not a limitation because,
semantically, all of the characters (a, b, c, etc.) also exist in the
Unicode scheme?

It makes sense to me that ASCII is not a limitation for those documents that
are limited to that character set. But, your own message, ³which contains
U+10DB ? GEORGIAN LETTER MAN and U+092E Ã DEVANAGARI LETTER MA² triggers an
error message in my own email client (Entourage X), namely:

³Some text in this message is in a langauge that your computer cannot
display.²

I¹m not certain if I¹m seeing this because I don¹t possess a font to display
those characters or some other reason. I suspect that this is the reason
because, when I try to look up those character's in OS X's Character
Palette, the Georgian and Devongari Unicode blocks show up blank.

The observation that I, the ³Irish (American) colleague,² made to Michael
was that there is a sentence in the NYT article displayed in my browser that
dropped the OOE7 LATIN SMALL LETTER C WITH CEDILLA (e.g., François).

There's nothing in the paragraph in question to indicate that there is a
missing character--nor is there a numeric code displayed for a savvy user to
look up.

Surely in this context, we would agree that the semantic content was
distorted, yes?

Sincerely,
Brian Doyle
Unicode newbie


On 9/25/03 11:54 AM, "Eric Muller" <[EMAIL PROTECTED]> wrote:

> 
> 
> Michael Everson wrote:
>> An Irish colleague here said he liked the article but noted that the Times'
>> web directors don't use Unicode....
>> 
>> 
>>> ...  
>>> <meta  http-equiv="charset" content="iso-8859-1">
>>> ...  
>>> 
>>> 
> There is an alternative point of view, which says that charset declared in an
> HTML (or XML) document is no more than an encoding scheme, and that all
> characters in those documents are fundamentally Unicode characters (i.e. they
> start in life with the full semantic of Unicode, they don't inherit it on the
> occasion of character set conversion). That view is supported by the XML spec
> itself, and by the infoset definition. And because we have numeric character
> entities, using an iso-8859-1 encoding scheme is not really a limitation:
> witness this message, which contains U+10DB ? GEORGIAN LETTER MAN and U+092E Ã
> DEVANAGARI LETTER MA.
> 
> Eric.
> 
> 



Reply via email to