Fundamentally when Unicode "unifies" characters it often does so "across sources". For example, any ordinary ASCII letters are unified across character sets, even if some legacy platform shows a somewhat  different pixel arrangement for some letter compared to some other platform.

The most common reason for Unicode to disunify characters relates to the *same* source showing both.

These same considerations apply to compatibility characters.

The primary goal for encoding any compatibility characters is to allow round-trip of data from the source with systems operating in Unicode and vice versa. It is a non-goal to be able to tell from the Unicode character code which legacy platform the character was mapped from or is being mapped to.

The required evidence to support a request for disunification therefore always consists of a document (screenshot) (usually other than a character set table) that shows that the two characters are distinct in their source environment and that that distinction matters (for example, that it can't be determined mechanically by context).

From the original document (section 1, page 1), it looks like that there are two characters that are distinct in the source, but have been mapped to the same Unicode character 1CE2B. I can certainly sympathize with the view that unifying these based on their close visual similarity was, what we used to call a case of "arms-length" unification.

In this example, a character stream representing data encoding the pieces used in the representation of a particular run of text in the "large character mode" would not reliably round trip, and after round-tripping (with a real device), the displayed characters would look subtly different. For handling data being processed transiently using Unicode there would be a loss of round-tripping, resulting in a change in data stream without a change in contents, which is what compatibility characters are designed to normally avoid. For a live terminal emulator, the effect would be a small degradation of the fidelity of the emulation. There's no simple workaround as analyzing the fragments in what amounts to 2-D text display isn't without challenges.

I can understand the frustration of the submitter on being told that there's an arbitrary limitation on fidelity and some degradation should be seen as acceptable.  While visually not prominent, the disposition needlessly violates source separation for a single character.


For the examples involving block characters, it is unclear whether they involve issues of unification within a source or across sources. If the unification is across sources (platforms) then knowing the target platform can be used to adjust the glyph being displayed, and there is no issue. The same is true for any SHIFT mode in a source character set, because whether the device operates in the shifted mode or not has to be known and already affects what is displayed at some byte location in the source character set.

I cannot tell whether the Script Encoding disposition violates source separation or merely suggests reuse of character codes for multiple sources/modes in a way that may be amenable to disambiguation with additional, but available context information.

A./

Reply via email to