Richard Wordingham wrote:

Just to complicate matters, most documents encoded using ISO/IEC 2022
rely on default initial settings, and so to interpret them it is not
enough to say it is in an ISO/IEC 2022 encoding, but instead one must
specify the particular encoding, which then defines the initial
states.

ISO 2022 does require a particular initial state, but the ones Richard is talking about are specific to ISO 2022-based encodings, such as ISO-2022-CN or ISO-2022-JP. Those are really different encodings from generic ISO 2022; in addition to the secret magic initial state, they may also allow certain shortcuts in the switching characters which aren't allowed in fully conformant 2022.

Asmus Freytag wrote:

ISO 2022 allows switching among sets in mid stream, but as far as I
remember (haven't had to think about this since Unicode came around)
the code unit is still a byte, except that sometimes pairs of bytes
are used. As I remember, ISO 2022 was still far from widely supported
in the late 80's and practically not at all on the fast growing PC
sector.

ISO 2022 code units are indeed bytes, even for the double- or (theoretical) triple-byte sets, and it was indeed almost never used on PCs.

I think it's important to remember that Roger's original question to the list was "Can a single text document use multiple character encodings?" He didn't ask if such a practice was common, or confusing, or a good idea, though perhaps those were underlying questions.

--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell ­

Reply via email to