Coming late to this discussion. Very excited by this approach of converting everything to UTF-32 in order to do fast offsets.

I'm really confused that case-insensitive should work at all for UTF-16 or UTF-32; at this point as far as I understand it, LC has no idea that how to correctly interpret the value of the variable as text. Or at least, I'd expect it work for some things - e.g. A/a which are the same as single bytes; and _also_ for Å/å because those are also equivalently 'single byte' - 0xC5 and 0xE5; but not for e.g. Ă/ă which are are 0x0102 and 0x0103, where I wouldn't expect 0x03 to be considered as a case-shifted version of 0x02. All this just proves that I don't understand what the new(ish) engine is doing with strings. I'm going to start a new thread to explore this.

In the meantime I'd be suspicious about doing a case-insensitive search in this way; but my guess would be that, if your use-case will accept case-sensitivity, it would be safer (and faster?) to use byteOffset on the UTF-32 data rather than offset.

Mr Very Picky would also suggest that to be really correct, the code in this case should also check that the offset found was on a four-byte boundary (tPos mod 4 = 1); it's probably a purely theoretical consideration, but I think that the four-byte sequence (representing the character you're searching for) could in theory be incorrectly matched across two other characters.

On 12/11/2018 05:00, Brian Milby via use-livecode wrote:
I just tried one additional test.  Search for "åå" within "aaååÅÅååaa".
(On a Mac keyboard, the characters are made with A, Option-A, and
Shift-Option-A.)  The Offset UTF16 version does not return the correct
result if case sensitive is false (returns the same value as if it were
true: 3,7).  Every other version correctly performs the case folding
(3,4,5,6,7).
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to