Re: Non-ascii string processing?

Peter Kirk Tue, 07 Oct 2003 06:21:30 -0700

On 07/10/2003 04:35, Jill Ramonsky wrote:

No. What you have demonstrated below is that given an API based on characters, one can write an API based on default grapheme clusters. Nonetheless, it is only the /_resulting _/default-grapheme-cluster-based API which would actually be of any use to end-users.

...and anyone who even /thinks/ of writing an API based on default grapheme clusters is surely competent enough to write that that (almost trivial) character-based middle layer themselves.
I have yet to see an APPLICATION which needs a character-based API.
Jill

Well, application programming with default grapheme clusters will be fairly trivial when using a computer language which has string etc processing able to work transparently and efficiently with arbitrary length characters, I mean, default grapheme clusters. Until such computer languages are widely available, and given that for very many widely used natural languages (if NFC is used) characters and DGCs coincide, I would much prefer to work with a character-based API than have to always do my own combining of UTF-8 bytes.

Anyway, DGCs are not always what you want to work with. I work a lot with pointed Hebrew texts. For most purposes (though not for calculating space taken up on a line) the entities I need to work with correspond to Unicode characters rather than DGCs, for I work separately with the base characters (mostly consonants), the vowel points and the accents. In some cases the match is not precise, but it is a lot more convenient for my work if I can access a string character by character, rather than UTF-8 byte by UTF-8 byte or DGC by DGC. And, by the way, I have real examples of DGCs in Hebrew consisting of six characters.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: Non-ascii string processing?

Reply via email to