Re: [XeTeX] xetex and the unicode bidirectional algorithm.

Philip Taylor Mon, 09 Dec 2013 15:33:29 -0800


Keith J. Schultz wrote:
> Hi Phillip,
> 
> 1) I do not know Vietnamese!
> 
> 2) If I did uses the proper BMP would give me the answer.
>      As "sang would be a sequence of singualr octcets, and Vietnamese
>      would use multi-byte sequences! 
> 
> case closed! Like I mentioned there are often ways used to reduce the length 
> of
> the multibyte sequences. In that case one has to know the processed use to 
> get the proper
> unicode character code!


It is not necessary to "know" a language in order to be able to
algorithmically determine in which language a particular stretch
of text is written, if such algorithmic determination is possible.
I do not "know" Hebrew, but even I know that "בית דין‎" is Hebrew
and that "你好" is not.  What I do not know (and what I challenge
you to tell us" is whether "sang" is English or Vietnamese.

You wrote :  "for efficiency reasons, utf-8 strings are not properly
encoded and programs assume a particular language, to save space."

I invited you to tell us (the XeTeX list members, that is) what
would be a "properly encoded utf-8 string" for the sequence
"sang" which would enable a computer algorithm to determine
whether that string was "sang" (Vietnamese) or "sang" (English).

I am still hoping that you will be able to tell us what that
properly encoded utf-8 string is, rather than just metaphorically
waving your arms in the air while throwing around phrases such as
"proper BMP", "singular octets" and "multi-byte sequences".

Philip Taylor





--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

Reply via email to