Or did you mean "this is UTF-8 even though in only has characters that also 
look like ASCII?"  I was a bit confused :)

If you are communicating this information, then that's probably also a good 
time to also communicate "Use Unicode, like UTF-8, and you won't have this kind 
of problem!"

-Shawn

-----Original Message-----
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Asmus Freytag
Sent: Wednesday, November 10, 2010 12:39 PM
To: Jim Monty
Cc: unicode@unicode.org
Subject: Re: Is there a term for 
strictly-just-this-encoding-and-not-really-that-encoding?

If you want to get that point across to a general audience, you could use a 
more colloquial term, albeit one that itself derives from mathematics.

Text that can be completely expressed in ASCII is fits into something
(ASCII) that works as a "lowest common denominator" of a large number of 
character sets.

You could call it "lowest common denominator" text.

Since ASCII is the only set that exhibits such a lowest common denominator 
relationship with enough other sets to make it interesting, and since that 
relation is so well known, it's usually enough to just refer to it by name 
(ASCII) without needing a general term - except perhaps for general audiences 
that aren't very familiar with it.

In this kinds of discussions I find it invariably useful to mention that the 
copyright sign is not part of ASCII. (I suspect that it's the most common 
character that makes a text lose its "lowest common denominator" 
status).

A./





On 11/10/2010 11:41 AM, Jim Monty wrote:
> Here's a peculiar question.
>
> Is there a standard term to describe text that is in some subset CCS 
> of another CCS but, strictly speaking, is only really in the subset 
> CCS because it doesn't have any characters in it other than those represented 
> in the smaller CCS?
>
> (The fact that I struggled to phrase this question in a way that made 
> my meaning clear -- and failed -- is precisely my dilemma.)
>
> Text that has in it only characters that are in the ASCII character 
> encoding is also in the ISO 8859-1 character encoding and the
> UTF-8 character encoding form of the Unicode coded character set, 
> right? I often need to talk and write about text that has such 
> multiple personalities, but I invariably struggle to make my point 
> clearly and succinctly. I wind up describing the notion of it in awkwardly 
> verbose detail.
>
> So I'm left wondering if the character encoding cognoscenti have a 
> special utilitarian word for this, maybe one borrowed from mathematics (set 
> theory).
>
> Jim Monty
>
>
>
>





Reply via email to