John Cowan asked: > Doug Ewell scripsit: > > > > So is [VIQR] a 7-bit encoding, or a scheme layered on top of ASCII? > > > > It's a scheme layered on top of ASCII > > > And what is KOI-7? > > > > A true 7-bit encoding for Russian, in which Cyrillic letters (small and > > capital respectively) were encoded in the ranges where ASCII has Latin > > letters (capital and small respectively). > > Ah. And on what principle do you distinguish them?
VIQR uses (for example) a sequence of two ASCII characters 'd' + 'd' to represent, conventionally, the Vietnamese barred-d, i.e., U+0111 LATIN SMALL LETTER D WITH STROKE. However, that is the convention for the use of a sequence of two ASCII characters -- not a direct encoding of the character. It is correct (and appropriate) to display VIQR with an ASCII font, in conformance with the ASCII standard. People then learn to interpret the various sequences of letters or letters plus ASCII punctuation and symbols as representing "real" Vietnamese orthography. KOI-7, on the other hand, is an encoded character set. The *definition* of the code points is as representing the Cyrillic letters. 0x40 encodes CYRILLIC SMALL LETTER YU. It is not AT SIGN masquerading as YU. It is correct (and appropriate) to display KOI-7 with a KOI-7 font, in conformance with the KOI-7 standard; it is *not* correct to display it with an ASCII font. The fact that KOI-7 was designed the way it was to make it feasible to do Cyrillic on devices that could only handle ASCII data is besides the point -- it was simply a clever way to get around the then 7-bit limitations of devices. > The IETF clearly > treats them both as charsets, within its definitions. The IETF definition of "charset" is underdetermined for distinguishing these kinds of cases. Any specification that allows you to map unambiguously from a sequence of bytes to a sequence of abstract characters is, potentially, considered a "charset" in the IETF sense, right? As such, it cannot readily distinguish between true coded character sets and conventional orthographies built on top of ASCII, for example. --Ken