RE: Roundtripping Solved

Arcane Jill Thu, 16 Dec 2004 07:23:01 -0800

Arcane Jill wrote:

#    for all possible octet sequences s:
#        length of (UTF-8(f(s)) <= length of s,

No, that is not the requirement. It is:
bytelength(f(s)) <= 2*bytelength(s)

You haven't understood. By definition, s is an octet stream, and f(s) is a Unicode character stream - and therefore "bytelength(f(s))" is completely meaningless. You cannot take the byte-length of a Unicode character or a Unicode character stream. "bytelength(UTF-8(f(s))", on the other hand, does make sense.

And I say again, your own solution, in which (for example) 0x9F maps to U+EE9F, does not meet the requirement, since UTF-8(U+EE9F) is { EE BA 9F }, the byte-length of which is > 2 * 1.

What was wrong with my suggestion which would have mapped 0x9F to { U+0002 U+001F }, by the way? This actually /does/ meet your new requirement.

Jill

-----Original Message----- From: Lars Kristan [mailto:[EMAIL PROTECTED] Sent: 16 December 2004 11:54 To: 'Arcane Jill'; Unicode Subject: RE: Roundtripping Solved

RE: Roundtripping Solved

Reply via email to