Re: Unicode mysteries

Neville Smythe via use-livecode Thu, 26 Mar 2020 11:24:22 -0700

> 
>> Which should correspond to codepoints
>>       1F3F4 E0067 E0062 E0073 E0063 E0074 E007F
>> And indeed if I manually build a UTF-16 string with these code points
>> it does display as the flag of Scotland. So the lesson is that the
>> reported chunks are not to be naively trusted  --- tho not exactly a
>> bug given the documentation warning.
> 
> Well this would be a bug! If you try codepoint 1..14 - then you will see 
> that they alternate between a codepoint and zero - the codepoints appear 
> to correspond to the relevant surrogate pair codeunits. i.e. codepoint 
> is misinterpreting the index as a codeunit index, rather than a 
> codepoint index :|
> 
> If you file a bug then I suspect this can be fixed quite quickly (famous 
> last words of course!).



Thanks Mark, I will file a bug report.

I don’t *really* need the actual font the system uses to display unsupported 
codepoints. I was thinking of using it as a lazy way to find out which single 
codepoints are supported rather than having to parse the cmap tables in the 
font file. As a way of learning about unicode I was trying to writing an LC 
version of the character map/PopChar utilities; a project doomed to failure 
because it’s just too hard to find out which multi-codepoint glyphs are 
supported by a font. This is a question frequently asked on forums, but it 
seems there is no answer other than reverse engineering the morx table in the 
fontfile, which is way too complex to be worth the effort. There is a published 
list for Emoji fonts but that would not be possible for general ligatures or 
glyph variations presumably.

Any comment on the LC behaviour of treating the Rainbow flag (which is a 
multi-codepoint glyph composed of three characters)
as 3 separate text characters, requiring 3 backspace operations to delete it in 
a field, rather than a single backspace as works in TextEdit?  [The first 
backspace eliminates the rainbow flag glyph but leaves the white flag showing; 
the second backspace eliminates the invisible join codepoint, so to the user 
seems to do nothing; the third backspaced finally eliminates the last glyph.]  
Is this a design choice or a bug?

Bob: I am looking at the Digest, where nonstandard characters (even, 
annoyingly, quotes) are replaced by question marks, which makes code snippets 
very hard to read. Is there a setting I should change to fix this?

Neville
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Unicode mysteries

Reply via email to