On 12/26/2025 12:40 AM, [email protected] via Unicode wrote:
When I say Unicode 'supports' a character set I mean that Unicode includes all characters in that character set.

The minimal requirement is that you can express external data (created in that character set) in a unique way as a stream of Unicode characters so that when you re-export the same data no distinctions have been lost.

The next requirement is that characters that are clearly analogous to existing Unicode characters are unified. That ensures that generic algorithms can be used to process the (content part) of such externally generated data while they are encoded in Unicode.

This also implies that if the same content is created in different external data sets the data can be compared and equal content gets equal representation.

There are edge cases when characters are less part of "text content" and more specific to a given external environment (and thus make no sense if taken from one external environment, translated via Unicode and shown in another such external environment). (I'm assuming all those external sets are "legacy" environment that have ways to actually display data that is very different from normal Unicode text documents.)



    Note that a "grid of character cells" is a two-dimensional layout
    of glyphs. That is different from the Unicode character-glyph
    model and plain text. If a system wants to reproduce VT330/VT340
    behaviour, it needs to layer a higher-level protocol on top of
    Unicode plain text. So, don't expect Unicode plain text and
    character-glyph models to reproduce a VT330. And, the higher-level
    protocol can specify use of a font which has the glyph shapes and
    alignments that fit the VT330 behaviour.

The character grid itself is different from plain text, but individual characters within that grid still need to correspond to plain text characters.

Yes and no.

We discussed the 3 and 5 story summation operator made from glyph pieces. This is conceptually similar how layout systems for "regular" text documents will use glyph pieces for large fences and integral signs. After looking at the details it seems well motivated to extend the existing 2 story summation operator to the 3 and 5 story versions, by encoding additional glyph pieces. This may even be useful outside the VT330 environment.

I think there's a strong case to encode these.


I'm less convinced in the case of box drawing characters unless someone can provide a reasonable scenario where it matters. The distinction is that it appears that these are not "text content" in the same way we discussed the glyph pieces used in mathematical display equation.

If the only expected use of these character is within terminal emulation software, then then only requirement that has to be satisfied is that the mapping is "unique and unambiguous". There's not an equally strong requirement to represent precise semantics because those semantics would not ever need to be transportable.

A useful test would be whether assignment to private use characters would be something that has an effect on dealing with these specific characters. Private use characters are tied to specific fonts, and if specific fonts are always tied to a given terminal emulator (if for no other reason than to get the proportions of the "cells' correct) then the effect of using private use characters is not observable.

(This assumes that there's no generic unicode-based processing takes placed and the drawing is handled by the emulator, for example.)

The case for encoding these is the logical equivalent for a case made that encoding these as private use characters actually impacts their usability, which boils down to proving that by necessity the are being interpreted by processes that would not be expected to be part of a private agreement on the meaning of these private use characters.

Terminal emulators for a specific terminal, by contrast could be subject to a private agreement without loss of generality.

So, unless the case is made that these have to be interchangeable in an interpretable way and explains why that is, there is a difficult row to hoe to try convince SEW that the issue warrants disunifying the line-drawing characters.

A./

Reply via email to