Re: [sqlite] Things you shouldn't assume when you store names

Richard Damon Mon, 11 Nov 2019 08:21:21 -0800

On 11/11/19 10:49 AM, Jose Isaias Cabrera wrote:
>
> Richard Damon, on Monday, November 11, 2019 09:47 AM, wrote...
>> On 11/11/19 9:26 AM, Jose Isaias Cabrera wrote:
>>> Simon Slavin, on Monday, November 11, 2019 08:50 AM, wrote...
>>>> On 11 Nov 2019, at 1:35pm, Jose Isaias Cabrera, on
>>>>
>>>>> Not if the system uses UTF32. :-) You could put the pictograph in that 
>>>>> that textbox, and it'll work.
>>>> Can you point to some description of this and how it works ?  I've never 
>>>> heard of it.
>>> My point was that one could define the UTF32 [1] code for that specific 
>>> pictograph or glyph, and it'll work.
>>>
>>> josé
>>>
>>> [1] https://en.wikipedia.org/wiki/UTF-32
>> UTF-32 gives no encoding advantage over other Unicode formats, as all
>> allow expressing all the Unicode code points.
> I disagree.  I believe that the future is UTF32.  I will give you that it's 
> bulky, for example, here is the letter a written to a file in Windows-1252, 
> UTF8 signed, UTF16be signed, a UTF32be signed:
>
> bytes filename
> 1     0_Windows-1252.txt
> 4     1_UTF8signed.txt
> 4     2_UTF16BEsigned.txt
> 8     3_UTF32signed.txt
>
> So, yes, it's bulky, but, if you want to count characters in languages such 
> as Arabic, Hebrew, Chinese, Japanese, etc., the easiest way is to convert 
> that string to UTF32, and do a string count of that UTF32 variable.  Most 
> people have to figure out what Unicode they are using, count the bytes, 
> divide by... and on, and on.  Not me, I just take that UTF8, or UTF16 string, 
> convert it to UTF32, and do a count.
UTF-32 is a reasonable internal operation format, if code-point
operations are important. It does not make a good transmission format,
as it is usually takes more media than UTF-8 or UTF-16, and for
transmission, the message size is important. The big issue is that
code-point counting is rarely what you want, you generally want Glyph
counting, which even UTF-32 doesn't provide.
>
>> There is no code-point assigned to the Pictogram for his name (As far as
>> I know), so their is no value you can put in represent it.
> You're right, but not that many people are changing their name to an image.  
> However, if two or three or more folks want to, there are enough empty UTF32 
> characters, that it can be accomplished.
But this shows that 'Unicode' doesn't handle the name, as is, which was
the point of the rule, if you design you software just assuming that
Unicode can handle all names, you will be very occasionally be wrong.
There are actually many more cases of this, I imagine a lot of
aboriginal people who have their own writing systems that haven't been
adopted by Unicode, have names (as their preferred name) that can't be
expressed in official Unicode. They may have a Government assigned
'official' name (if they have had to interact with the Government) that
can be represented, but that really isn't their name (Prince just had
the resources and gall to do it 'officially').
>
>
>> It would be possible to include in the application some way to add user
>> defined glyphs to the system fonts for user defined code points, and
>> then reconcile these when transferring data from one system to another.
> We have done this for special customer requirements and have assigned our own 
> UTF32 characters an specific design with our software.  But, yes, it's only 
> our software, but what if... a reconciliation can happen?
>
> josé
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
Richard Damon

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Things you shouldn't assume when you store names

Reply via email to