Igor Tandetnik, on Monday, November 11, 2019 02:56 PM, wrote...
>
> On 11/11/2019 12:30 PM, Jose Isaias Cabrera wrote:
> >
> > Igor Tandetnik, on Monday, November 11, 2019 11:02 AM, wrote...
> >>> Most people have to figure out what Unicode they are using, count the 
> >>> bytes, divide
> >>> by... and on, and on.  Not me, I just take that UTF8, or UTF16 string, 
> >>> convert it to
> >>> UTF32, and do a count.
> >>
> >> And then what do you do with that count? What do you use it for?
> >
> > Say that I am writing a report and I only want to print the first 20 
> > characters of a string
> A sequence of Unicode codepoints U+006F U+0302 U+0301 should be rendered as a 
> single grapheme
> ( ố  ) - what a human would think of as a "character". This is an actual 
> character in
> Vietnamese. Now, if you have several such triplets in a row in your string, 
> and you chop it at
> 20 codepoints, you'll only print 7 graphemes / "characters". Moreover, you'll 
> end up dropping
> the last combining accent, producing a different grapheme (ô) and 
> potentially altering the
> meaning of the text. (Don't know how much of a danger this is in Vietnamese, 
> but I know that
> combining viramas https://www.compart.com/en/unicode/combining/9 are vital to 
> Indic languages,
> and dropping one will in fact often produce a valid but different word).

Yes, dropping pieces of words is a problem in any language.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to