Coming late to this discussion. Very excited by this approach of converting
everything to UTF-32 in order to do fast offsets.
I'm really confused that case-insensitive should work at all for UTF-16 or
UTF-32; at this point as far as I understand it, LC has no idea that how to
correctly interpret the value of the variable as text. Or at least, I'd expect
it work for some things - e.g. A/a which are the same as single bytes; and
_also_ for Å/å because those are also equivalently 'single byte' - 0xC5 and
0xE5; but not for e.g. Ă/ă which are are 0x0102 and 0x0103, where I wouldn't
expect 0x03 to be considered as a case-shifted version of 0x02. All this just
proves that I don't understand what the new(ish) engine is doing with strings.
I'm going to start a new thread to explore this.
In the meantime I'd be suspicious about doing a case-insensitive search in
this way; but my guess would be that, if your use-case will accept
case-sensitivity, it would be safer (and faster?) to use byteOffset on the
UTF-32 data rather than offset.
Mr Very Picky would also suggest that to be really correct, the code in this
case should also check that the offset found was on a four-byte boundary (tPos
mod 4 = 1); it's probably a purely theoretical consideration, but I think that
the four-byte sequence (representing the character you're searching for) could
in theory be incorrectly matched across two other characters.
On 12/11/2018 05:00, Brian Milby via use-livecode wrote:
I just tried one additional test. Search for "åå" within "aaååÅÅååaa".
(On a Mac keyboard, the characters are made with A, Option-A, and
Shift-Option-A.) The Offset UTF16 version does not return the correct
result if case sensitive is false (returns the same value as if it were
true: 3,7). Every other version correctly performs the case folding
(3,4,5,6,7).
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode