Richard Wordingham wrote:
Just to complicate matters, most documents encoded using ISO/IEC 2022
rely on default initial settings, and so to interpret them it is not
enough to say it is in an ISO/IEC 2022 encoding, but instead one must
specify the particular encoding, which then defines the initial
states.
ISO 2022 does require a particular initial state, but the ones Richard
is talking about are specific to ISO 2022-based encodings, such as
ISO-2022-CN or ISO-2022-JP. Those are really different encodings from
generic ISO 2022; in addition to the secret magic initial state, they
may also allow certain shortcuts in the switching characters which
aren't allowed in fully conformant 2022.
Asmus Freytag wrote:
ISO 2022 allows switching among sets in mid stream, but as far as I
remember (haven't had to think about this since Unicode came around)
the code unit is still a byte, except that sometimes pairs of bytes
are used. As I remember, ISO 2022 was still far from widely supported
in the late 80's and practically not at all on the fast growing PC
sector.
ISO 2022 code units are indeed bytes, even for the double- or
(theoretical) triple-byte sets, and it was indeed almost never used on
PCs.
I think it's important to remember that Roger's original question to the
list was "Can a single text document use multiple character encodings?"
He didn't ask if such a practice was common, or confusing, or a good
idea, though perhaps those were underlying questions.
--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell