RE: Roundtripping Solved

Lars Kristan Thu, 16 Dec 2004 04:14:33 -0800

Title: RE: Roundtripping Solved

Peter Kirk wrote in response to Arcane Jill:
> > 3) There exists an inverse function, g(), such that g(a) ==
> b if and
> > only if f(b) == a.
>
>
> Lars seems to have extended the requirement here such that a
> can be any
> sequence of 16-bit words, just as b can be any sequence of
> octets, i.e.
> he requires not only that g(f(b)) == b for all b, but also
> that f(g(a))
> == a for all a. That may makes things much harder! There is
> at least a
> need to deal with unpaired surrogates.

I should have analyzed Jill's mail more carefully. This must be a misunderstanding.

My requirement is that g(f(b))=b, which is NON-UTF-8 => UTF-16 => NON-UTF-8.

However, f(g(a))=a was not my requirement. I even assert the two cannot be achieved at the same time.

If the two requirements could be met at the same time, there would be no problem and everybody would accept the solution since meeting f(g(a))=a keeps all Unicoders happy.

There are other requirements, or at least wishes. And one is that f(a) for a single byte should be a single BMP codepoint.

I think devising new algorithms will not help. What would be useful would be a proof that my algorithm doesn't break the rules of Unicode. OK, it does. So, try again: What would be useful would be a proof that my algorithm doesn't break the *functionality* that Unicode rules provide.

> can use either U+FFFE or U+FFFF, which "are
> intended for process internal uses, but are not permitted for
> interchange." Let's call the one non-character chosen INVALID.
Can't. I DO want the resulting UTF-16 to be valid for interchange. This is the whole purpose. And increasing the overhead is also not desired.

Lars

RE: Roundtripping Solved

Reply via email to