Arcane Jill wrote:
#    for all possible octet sequences s:
#        length of (UTF-8(f(s)) <= length of s,

No, that is not the requirement. It is:
bytelength(f(s)) <= 2*bytelength(s)

You haven't understood. By definition, s is an octet stream, and f(s) is a Unicode character stream - and therefore "bytelength(f(s))" is completely meaningless. You cannot take the byte-length of a Unicode character or a Unicode character stream. "bytelength(UTF-8(f(s))", on the other hand, does make sense.


And I say again, your own solution, in which (for example) 0x9F maps to U+EE9F, does not meet the requirement, since UTF-8(U+EE9F) is { EE BA 9F }, the byte-length of which is > 2 * 1.

What was wrong with my suggestion which would have mapped 0x9F to { U+0002 U+001F }, by the way? This actually /does/ meet your new requirement.

Jill

-----Original Message-----
From: Lars Kristan [mailto:[EMAIL PROTECTED]
Sent: 16 December 2004 11:54
To: 'Arcane Jill'; Unicode
Subject: RE: Roundtripping Solved




Reply via email to