Re: Pd: Odp: RE: What to do if a legacy compatibility character is defective?

Asmus Freytag via Unicode Fri, 24 Oct 2025 15:32:15 -0700

Fundamentally when Unicode "unifies" characters it often does so "acrosssources". For example, any ordinary ASCII letters are unified acrosscharacter sets, even if some legacy platform shows a somewhat differentpixel arrangement for some letter compared to some other platform.

The most common reason for Unicode to disunify characters relates to the*same* source showing both.


These same considerations apply to compatibility characters.

The primary goal for encoding any compatibility characters is to allowround-trip of data from the source with systems operating in Unicode andvice versa. It is a non-goal to be able to tell from the Unicodecharacter code which legacy platform the character was mapped from or isbeing mapped to.

The required evidence to support a request for disunification thereforealways consists of a document (screenshot) (usually other than acharacter set table) that shows that the two characters are distinct intheir source environment and that that distinction matters (for example,that it can't be determined mechanically by context).

From the original document (section 1, page 1), it looks like thatthere are two characters that are distinct in the source, but have beenmapped to the same Unicode character 1CE2B. I can certainly sympathizewith the view that unifying these based on their close visual similaritywas, what we used to call a case of "arms-length" unification.

In this example, a character stream representing data encoding thepieces used in the representation of a particular run of text in the"large character mode" would not reliably round trip, and afterround-tripping (with a real device), the displayed characters would looksubtly different. For handling data being processed transiently usingUnicode there would be a loss of round-tripping, resulting in a changein data stream without a change in contents, which is what compatibilitycharacters are designed to normally avoid. For a live terminal emulator,the effect would be a small degradation of the fidelity of theemulation. There's no simple workaround as analyzing the fragments inwhat amounts to 2-D text display isn't without challenges.

I can understand the frustration of the submitter on being told thatthere's an arbitrary limitation on fidelity and some degradation shouldbe seen as acceptable. While visually not prominent, the dispositionneedlessly violates source separation for a single character.

For the examples involving block characters, it is unclear whether theyinvolve issues of unification within a source or across sources. If theunification is across sources (platforms) then knowing the targetplatform can be used to adjust the glyph being displayed, and there isno issue. The same is true for any SHIFT mode in a source character set,because whether the device operates in the shifted mode or not has to beknown and already affects what is displayed at some byte location in thesource character set.

I cannot tell whether the Script Encoding disposition violates sourceseparation or merely suggests reuse of character codes for multiplesources/modes in a way that may be amenable to disambiguation withadditional, but available context information.

A./

Re: Pd: Odp: RE: What to do if a legacy compatibility character is defective?

Reply via email to