What to do if a legacy compatibility character is defective?

[email protected] via Unicode Fri, 24 Oct 2025 05:48:36 -0700
No, I&#39;m not talking about U+0149, which was marked as deprecated but is in 
fact a legitimate compatibility character and is not defective as it is the 
only reasonable way to represent the byte 0xF3 in a CP853 character cell.   I 
am aware that this issue has already been discussed many times before on this 
mailing list, but I still did not receive a proper explanation of how exactly 
the existing characters 1FB70—1FB81 1FBB5—1FBB8 1FBBC are intended to be used 
in the context of certain legacy computing platforms. As it is, I consider 
those characters defective.   For context, in L2/25-037, I have identified a 
fundamental defect in how PETSCII, Apple II, and HP 264x characters were 
encoded in Unicode. The box drawing characters (which depend on the typeface 
weight) and the block elements (which depend on fractions of bounding box size) 
were unified with each other, which in some cases contradicted the source 
legacy platforms. The Unicode 13.0 mapping table of PETSCII and Apple II 
characters relied on the assumption that the thickness of light box drawing 
characters is equal to 1÷8 of the width or height of the character. This 
assumption is incorrect in case of C64 version of PETSCII (where the thickness 
is 1÷4 of the width and height) and in Apple II (where the thickness is 1÷7 of 
the width and 1÷8 of the height). In case of HP 264x, two of the characters 
that were unified to the same Unicode character were identified to have not 
only distinct glyphs but also distinct types of box drawing connections, and 
both characters occur within the same encoding, leaving the Unicode mapping 
incomplete.   The response to this proposal in L2/25-010 is fundamentally 
logically incorrect and does not provide any feedback whatsoever. In that 
response, terms like &#39;differences in plain text&#39;, &#39;glyph 
distinctions&#39;, &#39;character identities&#39; or &#39;appropriate 
fonts&#39; are thrown around as buzzwords, completely defying all logic. The 
proposal already thoroughly explains why the Unicode 13.0—17.0 mapping is 
defective and why the proposed characters have a completely different identity 
from existing characters, which also makes it impossible to resolve with 
appropriate fonts.   However, what makes this especially problematic is that 
some of the Unicode characters were encoded for compatibility with legacy 
platforms, but the fundamental character identity that the characters were 
encoded with is not compatible with the original identity of the characters in 
the source platform.   The characters 1FB70—1FB7F, according to the L2/19-025 
compatibility table (19025-aux-LegacyComputingSources.pdf), were encoded for 
compatibility with PETSCII, but their character identity as specified in 
Unicode is defined in terms of 1÷8 blocks. This already makes the characters 
incompatible with C64 version of PETSCII. The characters also fit into the 1÷8 
blocks encoded in 2581 258F 2594—2595, but as PETSCII includes both light box 
drawings and fractions of blocks, and those characters is where the two groups 
of characters &#39;intersect&#39;, causing the true top/bottom (but not 
left/right) light box drawings to be mapped to different values, as I already 
thoroughly explained in L2/25-037. However, the PETSCII character 0x5D is 
mapped to both U+2502 and U+1FB73, and the PETSCII character 0x40 is mapped to 
both U+2500 and U+1FB79. However, in legacy computing text modes, all of 
character tiles have a 1∶1 mapping to a fixed size region of the screen, and 
all the tiles are independent from each other, so it makes no sense whatsoever 
to use multiple Unicode characters to represent the same legacy character. In 
the context of both PET/VIC20 and C64 versions of PETSCII, the characters 
representing horizontal and vertical lines match the thickness of the common 
light box drawing characters, and do not match 1÷8 blocks in C64, therefore it 
is inappropriate to identify them as a set of 1÷8 blocks. Similarly for Apple 
II compatibility characters 1FB7C 1FB80—1FB81 1FBB5—1FBB8 1FBBC, which are also 
defective for reasons I explained in L2/25-037. Some of those characters are 
also used in other platforms (across both 13.0 and 16.0), which I haven&#39;t 
analyzed thoroughly but also have similar issues.   Therefore, 1FB70—1FB81 
1FBB5—1FBB8 1FBBC are defective, because their character identity mismatches 
that of the original characters on the source platforms. The Unicode 16.0 
change of character identity of U+1FB81 does not resolve the issue either as it 
makes the third and fifth blocks unspecified but still enforces 1÷8 blocks on 
top and bottom. This also cannot be resolved by changing the identity of those 
characters to light box drawings or unspecified thickness because it would 
violate the consistency with 2581 258F 2594—2595 and disrupt implementations 
that rely on that consistency. And forget about contextual substitutions and 
other overcomplicated mechanisms, because they&#39;re completely irrelevant in 
the context of a grid of independent character tiles.   Relating to the 
L2/25-010 claims that this issue &#39;can be solved by using appropriate 
fonts&#39;, in case of PETSCII PET/VIC20, the source platform font does in fact 
match the character identities of the Unicode mapping. In case of Apple II, the 
source platform could be considered to match the character identities if the 
left and right 1÷8 blocks are rounded to 1 pixel in the width of 7 pixels, but 
it makes no sense for the character identities to hinge on platform-specific 
rounding when there is already a consistent light box drawing thickness to work 
with. In case of PETSCII C64, the source platform font mismatches the character 
identities of the Unicode mapping, making it impossible to resolve using 
&#39;appropriate fonts&#39;. In case of HP 264x, the source platform font has 
two different glyphs for two different character identities in the same 
encoding for the same Unicode character, which also makes it impossible to 
resolve using &#39;appropriate fonts&#39;. So how is anyone ever supposed to 
use those characters in the context of PETSCII C64 or HP 264x encoding?
What to do if a legacy compatibility character is defective?

Reply via email to