Re: [sqlite] Things you shouldn't assume when you store names

Richard Damon Mon, 11 Nov 2019 13:35:12 -0800

On 11/11/19 2:57 PM, Jose Isaias Cabrera wrote:
> Igor Tandetnik, on Monday, November 11, 2019 02:24 PM, wrote...
>> On 11/11/2019 12:50 PM, Richard Damon wrote:
>>> Writing 20 UTF-32 characters may ALSO print less than 20 glyphs to the
>>> screen.
>> Or more, depending on what you mean by  "glyph". See e.g. U+FDFB (ARABIC
>> LIGATURE JALLAJALALOUHOU,
>> https://www.fileformat.info/info/unicode/char/fdfb/index.htm ) or U+FB03
>> (LATIN SMALL LIGATURE FFI,
>> https://www.fileformat.info/info/unicode/char/fb03/index.htm)
> Thanks for this, Igor.  Again, UTF32 has lots of space, still.  If you look 
> at the representation of these two characters,
>
> ARABIC LETTER JALLAJALALOUHOU UTF-32 (hex) 0x0000FDFB (fdfb)
> LATIN SMALL LIGATURE FFI UTF-32 (hex) 0x0000FB03 (fb03)
>
> Look at their hex representations in UTF32:
> 1. 0x0000FDFB
> 2. 0x0000FB03
>
> The first 4 0's are still unused spaces.  Japanese, Chinese, etc., glyphs 
> have an unique UTF32 code, so, it will always work.
>
> josé


Unicode has decreed that the highest code-point that can be called a
code-point is 0x10FFFF because to go higher breaks UTF-16, so there
isn't as much room as you might think.

This give us 1,114,112 possible code points.

There are currently 137,994 code points assigned to characters, 66
assigned as non-characters, 2048 reserved for the surrogates, and a
number reserved for private use, leaving 836,536 currently unassigned.
This says we have some space to grow, but there are still a lot of
archaic and unusual scripts that are being proposed or worked on.

-- 
Richard Damon

_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

Reply via email to