On 11/11/19 2:16 PM, Jose Isaias Cabrera wrote: > Richard Damon, on Monday, November 11, 2019 12:50 PM, wrote... > >> Writing 20 UTF-32 characters may ALSO print less than 20 glyphs to the >> screen. > This is not true, if the string has more or at least 20 UTF32 characters, and > you request 20 character while still talking UTF32, it will print 20. Once > you move to UTF16 or UTF8, then, yes, you are correct. You will get twenty code points but not twenty glyphs. UTF-32 has the property that one code-unit is one code-point (which UTF-8 and UTF-16 don't have), but not one code-point = 1 glyph. >> One quick way to see this is that there is a need for NFD and NFC >> representations, because some characters can be decomposed from a >> combined character into a base character + a combining character, so a >> string in NFD form may naturally 'compress' itself when being printed. > This is the reason why you want to use UTF32. UTF8, and UTF16 has to use > combination of their character set to cover Eastern languages. While all > languages fit perfectly in UTF32 and they all have their own unique home. > > josé
No. A simple example: Ἀβιά vs Ἀβιά Both are 4 glyphs or what we would call characters, the first is 6 code-points (U+391, U+313, U+3B2, U+3B9, U+3B1, U+301), the second is 4 code-points (U+1F08, U+3B2, U+3B9, U+3AC) In this case the decomposed characters happen to match a composed characters, but that is not always true, some less common composed glyph do not have a unique single code point assigned to them). This shows that 1 code point does not equal 1 character, for the usual user definition of a character. There are a NUMBER of points in Unicode where to express a single glyph to the use, it takes multiple code-points to express it. Very shortly after they realized they needed to extend Unicode beyond the initial 16 bit character set they first thought it could be, the also realized that they could never reach the goal of assigning a unique code point to the basic glyphs of every language, so settled on letting some (many) glyphs be expressed as a combination of glyphs, with somewhat simple (but not trivial) rules on how to do this. -- Richard Damon _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users