Re: Can a single text document use multiple character encodings?

Doug Ewell Wed, 28 Aug 2013 18:38:58 -0700

Richard Wordingham wrote:

Just to complicate matters, most documents encoded using ISO/IEC 2022
rely on default initial settings, and so to interpret them it is not
enough to say it is in an ISO/IEC 2022 encoding, but instead one must
specify the particular encoding, which then defines the initial
states.

ISO 2022 does require a particular initial state, but the ones Richardis talking about are specific to ISO 2022-based encodings, such asISO-2022-CN or ISO-2022-JP. Those are really different encodings fromgeneric ISO 2022; in addition to the secret magic initial state, theymay also allow certain shortcuts in the switching characters whicharen't allowed in fully conformant 2022.


Asmus Freytag wrote:

ISO 2022 allows switching among sets in mid stream, but as far as I
remember (haven't had to think about this since Unicode came around)
the code unit is still a byte, except that sometimes pairs of bytes
are used. As I remember, ISO 2022 was still far from widely supported
in the late 80's and practically not at all on the fast growing PC
sector.

ISO 2022 code units are indeed bytes, even for the double- or(theoretical) triple-byte sets, and it was indeed almost never used onPCs.

I think it's important to remember that Roger's original question to thelist was "Can a single text document use multiple character encodings?"He didn't ask if such a practice was common, or confusing, or a goodidea, though perhaps those were underlying questions.


--
Doug Ewell | Thornton, CO, USA

http://ewellic.org | @DougEwell

Re: Can a single text document use multiple character encodings?

Reply via email to