Re: UTF-8 syntax

Peter_Constable Thu, 07 Jun 2001 20:41:37 -0700

On 06/07/2001 08:50:37 PM Jianping Yang wrote:

>I don't get point from this argument as UTF-8S is exactly mapped to UTF-16
in
>UTF-16 code unit which means one UTF-16 code unit will be mapped to either
one,
>two, or three bytes in UTF-8S. So if you are saying there is ambiguous in
>UTF-8S, it should also apply to UTF-16, which does not make sense to me.

You know what? After all my harping, you're absolutely right on that point.
I was focused on a strict application of the definitions as currently
stated in the standard to UTF-8s (and UTF-8), and wasn't thinking about how
UTF-16 weighs up against them. Indeed, UTF-16 has the same problem in
relation to the definitions. My conclusion is that the definitions have
some problems and need to be revised -- this is something that I have
thought for some time now. (Indeed, I pointed out yesterday that D36(c)
appears to be in contradiction with D32.) I happen to know that the
editorial committee is, in fact, reworking on the defintions and on the
text of the Standard to make it consistent and bring it in line with the
character model, UTR#17.

I suspect that after the definitions are refined, the objection I was
trying to raise against UTF-8s will have been eliminated. Either way, it
*does* apply equally to UTF-16, and thus I can't bring it against UTF-8s. I
therefore concede that argument.

There are still a bunch of other arguments out there against UTF-8s that
have yet to be refuted, however.


- Peter


---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>
Re: UTF-8 syntax

Reply via email to