On 11/11/19 2:16 PM, Jose Isaias Cabrera wrote:
> Richard Damon, on Monday, November 11, 2019 12:50 PM, wrote...
>
>> Writing 20 UTF-32 characters may ALSO print less than 20 glyphs to the
>> screen.
> This is not true, if the string has more or at least 20 UTF32 characters, and 
> you request 20 character while still talking UTF32, it will print 20.  Once 
> you move to UTF16 or UTF8, then, yes, you are correct.
You will get twenty code points but not twenty glyphs. UTF-32 has the
property that one code-unit is one code-point (which UTF-8 and UTF-16
don't have), but not one code-point = 1 glyph.
>> One quick way to see this is that there is a need for NFD and NFC
>> representations, because some characters can be decomposed from a
>> combined character into a base character + a combining character, so a
>> string in NFD form may naturally 'compress' itself when being printed.
> This is the reason why you want to use UTF32.  UTF8, and UTF16 has to use 
> combination of their character set to cover Eastern languages.  While all 
> languages fit perfectly in UTF32 and they all have their own unique home.
>
> josé

No.

A simple example: Ἀβιά vs Ἀβιά

Both are 4 glyphs or what we would call characters, the first is 6
code-points (U+391, U+313, U+3B2, U+3B9, U+3B1, U+301), the second is 4
code-points (U+1F08, U+3B2, U+3B9, U+3AC)

In this case the decomposed characters happen to match a composed
characters, but that is not always true, some less common composed glyph
do not have a unique single code point assigned to them).

This shows that 1 code point does not equal 1 character, for the usual
user definition of a character.

There are a NUMBER of points in Unicode where to express a single glyph
to the use, it takes multiple code-points to express it. Very shortly
after they realized they needed to extend Unicode beyond the initial 16
bit character set they first thought it could be, the also realized that
they could never reach the goal of assigning a unique code point to the
basic glyphs of every language, so settled on letting some (many) glyphs
be expressed as a combination of glyphs, with somewhat simple (but not
trivial) rules on how to do this.

-- 
Richard Damon

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to