Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

Martin J. Dürst Wed, 10 Nov 2010 20:08:09 -0800

On 2010/11/11 6:28, Mark Davis ☕ wrote:

That is actually not the case. There are superset relations among some of
the CJK character sets, and also -- practically speaking -- between some of
the windows and ISO-8859 sets. I say practically speaking because in general
environments, the C1 controls are really unused, so where a non ISO-8859 set
is same except for 80..9F you can treat it pragmatically as a superset.

Yes, except that the terms superset/subset (and set in general)shouldn't be used unless you really strictly speak about the repertoireof characters, and not the encoding itself. So e.g. the repertoire ofiso-8859-1 is a subset of the repertoire of UTF-8. However, iso-8859-1is not a subset of UTF-8, not because you can't label some text encodedas iso-8859-1, but because subset relationships among the encodingsthemselves don't make sense).Also, US-ASCII is not a subset of UTF-8, because when you just use thenames of the character encodings, you mean the character encodings, andcharacter encodings don't have subset relationships.

It may as well be possible to use (create?) the term sub-encoding,saying that an encoding A is a sub-encoding of encoding B if all (legal)byte sequences in encoding A are also legal byte sequences in encoding Band are interpreted as the same characters in both cases. In this sense,US-ASCII is clearly a sub-encoding of UTF-8, as well as a sub-encodingof many other encodings. You can also say that iso-8859-1 is asub-encoding of windows-1252 if the former is interpreted as notincluding the C1 range.


Regards,   Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:due...@it.aoyama.ac.jp

Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

Reply via email to