On 4/15/2012 7:30 PM, Rick McGowan wrote:
> At Wiktionary, we're looking at ẘ (U+1E98) and
> we can't figure out where it came from.

Good catch. It's obviously another stowaway...
Just throw it in the brig until we can get around to deporting it.




The 1E00 and 1F00 blocks were populated, in Unicode 1.1 by rejects from Unicode 1.0 that were re-admitted as part of the merger with ISO/IEC 10646. If you have anyone with access to the early (paper only) meeting documents of WG2, you might, just might, find a source for them.

Most of these characters were "rejected" because they were unnecessary - they are easily encoded as combining sequences and there were no legacy character sets that needed them precomposed for 1:1 roundtrip compatibility. WG2 and Unicode (before the merger) had different standards on what compatibility characters were required.

(There were some gaps in these blocks after the initial population of characters were added in Unicode 1.1. These were later filled with more solid candidates, so the "age" of each character is an important clue here).

Stowaway is an apt term - because the characters did not add anything new (they could already be encoded as combining sequences) and because normalization would remove them from the data stream, nobody tried very hard to fine-tune the set and as a result risk the failure of the merger. Ideal conditions for "stowaways" to enter hiding in the crowd.

A./

Reply via email to