On 11/11/19 3:49 PM, Jose Isaias Cabrera wrote: > Richard Damon, on Monday, November 11, 2019 02:37 PM, wrote... > >> No. > Aaaah, my apologies. We are talking about different things. You are talking > about a combination of Unicodes vs. full, character. I take it back. Yes, if > you are combining these, then, of course, you are going to have to a > different word count because there are actually characters being involved. > are talking pieces vs. full words. If there is a combination, is just like > the accented e, é, why not use the one character vs the combination? > > josé
Because not all accented characters have a single code-point. In my example there is, because Greek was worked on earlier. At some point in the work on Unicode, they realized that there really were too many combinations that happen in real life to try to assign code-points to all of them. This also happened in the CJK characters there are a very large number of them, far more than they want to give code-points to, so a large number of archaic forms, that are currently mostly only used in names, are built with composing characters. (Back to problems with names). The article at http://unicode.org/faq/char_combmark.html gives some examples, one is: The Devanagari syllable "ni" must be composed using a base character "na" (न) followed by a combining vowel for the "i" sound ( ि), although end users see and think of the combination of the two "नि" as a single unit of text. So the question comes, when do you REALLY need to know how many code-points are in a string, or get a specific number of them? Having a given number of code-units (or bytes) can be useful for building indexes where a fixed size makes addressing easier for searching. Counting by Glyphs is sometimes useful at presentation layer (but needs to be combined with character widths). An Input Method would need to deal with the characters as code-points (likely decomposed), but also probably needs to know about the Glyph to show the cursor (unless that can be handled by the output method that it uses). -- Richard Damon _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users