Arcane Jill <arcanejill at ramonsky dot com> wrote: > DEFINITION - "f" is a function which maps an arbitrary octet stream to > a sequence of Unicode characters, such that (1) any substring which > happens to be valid UTF-8 is mapped to the sequence of Unicode > characters which would have been produced by UTF-8, and (2) all > remaining single octets, xx (with x necessarily such that 0x80 <= xx > <= 0xFF) are each mapped to the sequence: { U+0C55E3, U+01ED7A, > U+05FDCB, U+09C351, U+07E168, U+0BBC80, U+107C09, U+0BA458, U+064188, > U+048375, U+08ACE0, U+031DEF, U+00xx } (I got those numbers from a > true random number generator).
Reminds me of Masahiko Maedera's "UTF-16X" proposal, which used triples of code points in the block U+EExxx to represent values above 0x110000, under the (false) assumption that such a thing was needed. Of course, Jill's scheme uses non-private-use Unicode scalar values to achieve what is essentially a private-use function, so this is still non-conformant. (A similar scheme that only used code points from the Plane 0, Plane 15, and Plane 16 PUAs would be fine.) But I gather that Lars isn't too worried about being non-conformant, or we wouldn't be having this thread. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/