On 11/11/19 3:49 PM, Jose Isaias Cabrera wrote:
> Richard Damon, on Monday, November 11, 2019 02:37 PM, wrote...
>
>> No.
> Aaaah, my apologies.  We are talking about different things. You are talking 
> about a combination of Unicodes vs. full, character. I take it back.  Yes, if 
> you are combining these, then, of course, you are going to have to a 
> different word count because there are actually characters being involved.  
> are talking pieces vs. full words.  If there is a combination, is just like 
> the accented e, é, why not use the one character vs the combination?
>
> josé

Because not all accented characters have a single code-point. In my
example there is, because Greek was worked on earlier. At some point in
the work on Unicode, they realized that there really were too many
combinations that happen in real life to try to assign code-points to
all of them. This also happened in the CJK characters there are a very
large number of them, far more than they want to give code-points to, so
a large number of archaic forms, that are currently mostly only used in
names, are built with composing characters. (Back to problems with names).

The article at http://unicode.org/faq/char_combmark.html gives some
examples, one is:

The Devanagari syllable "ni" must be composed using a base character
"na" (न) followed by a combining vowel for the "i" sound ( ि), although
end users see and think of the combination of the two "नि" as a single
unit of text.

So the question comes, when do you REALLY need to know how many
code-points are in a string, or get a specific number of them? Having a
given number of code-units (or bytes) can be useful for building indexes
where a fixed size makes addressing easier for searching. Counting by
Glyphs is sometimes useful at presentation layer (but needs to be
combined with character widths).

An Input Method would need to deal with the characters as code-points
(likely decomposed), but also probably needs to know about the Glyph to
show the cursor (unless that can be handled by the output method that it
uses).

-- 
Richard Damon

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to