Arcane Jill wrote:# for all possible octet sequences s: # length of (UTF-8(f(s)) <= length of s,
No, that is not the requirement. It is: bytelength(f(s)) <= 2*bytelength(s)
You haven't understood. By definition, s is an octet stream, and f(s) is a Unicode character stream - and therefore "bytelength(f(s))" is completely meaningless. You cannot take the byte-length of a Unicode character or a Unicode character stream. "bytelength(UTF-8(f(s))", on the other hand, does make sense.
And I say again, your own solution, in which (for example) 0x9F maps to U+EE9F, does not meet the requirement, since UTF-8(U+EE9F) is { EE BA 9F }, the byte-length of which is > 2 * 1.
What was wrong with my suggestion which would have mapped 0x9F to { U+0002 U+001F }, by the way? This actually /does/ meet your new requirement.
Jill
-----Original Message-----
From: Lars Kristan [mailto:[EMAIL PROTECTED]
Sent: 16 December 2004 11:54
To: 'Arcane Jill'; Unicode
Subject: RE: Roundtripping Solved