On 2018-11-13 01:06, Geoff Canyon via use-livecode wrote:
On Mon, Nov 12, 2018 at 11:36 AM Ben Rubinstein via use-livecode <
use-livecode@lists.runrev.com> wrote:
I'm really confused that case-insensitive should work at all for
UTF-16 or
UTF-32;
The caseSensitive (and formSensitive) properties only apply to strings
*not* binary strings.
The output of textEncode() is a binary string.
The 'is' operator is overloaded - in strict order:
left-empty 'is' right-ANY -- returns is-empty(right-ANY)
left-ANY 'is' right-empty -- returns is-empty(left-ANY)
left-array 'is' left-array -- compare as array
left-number 'is' right-number -- compare as number
left-numeric-[binary]-string 'is' right-numeric-[binary]-string --
compare as number
left-binary-string 'is' right-binary-string -- compare as binary
strings
left-any 'is' right-any -- compare as strings
Also concatenation, put after and put before are overloaded:
binary-string & binary-string -> binary-string
string & ANY -> string
ANY & string -> string
put src-data after|before dst-data -> dst-data is binary-string
put src-ANY after|before dst-ANY -> dst-ANY is string
This is so puzzling. I tried this code in a button:
on mouseUp
put "Ѡ" into x
put "ѡ" into y
--put ("Ѡ" is "ѡ") && (x is y)
--exit mouseUp
put textencode("Ѡ","UTF-32") into xBig
put textencode("ѡ","UTF-32") into xSmall
repeat for each byte B in xBig
put B after yBig
end repeat
repeat for each byte B in xSmall
put B after ySmall
end repeat
put "Ѡ" into zBig
put "ѡ" into zSmall
put zBig into wBig
put zSamll into wSmall
put textencode(zBig,"UTF-32") into zBig
put textencode(zSmall,"UTF-32") into zSmall
put x into j
put y into k
set caseSensitive to false
put ("Ѡ" is "ѡ") && (xBig is xSmall) && (yBig is ySmall) && (zBig is
zSmall) && (wBig is wSmall) && (x is y) && (j is k)
end mouseUp
That puts: true false false false true true true
Things to note:
1. "Ѡ" and "ѡ" are upper and lower case omega in cyrillic, 00000460 and
00000461. Given the string literals, LC is happy to say they are the
same
(the first true)
2. Put them in a variable, LC is happy to say they are the same
(the second-to-last true).
3. Convert them to UTF-32 and LC no longer recognizes them as the same
(the
fourth boolean, false)
4. Put the variables into other variables, and LC identifies them as
the
same (the last true)
("Ѡ" is "ѡ") is true because they are both strings
(xBig is xSmall) is false because both sides are binary-strings (and so
compare byte for byte)
(yBig is ySmall) is false because both sides are binary-strings
(zBig is zSmall) is false because you've textEncoded strings which
produce binary-strings so both are binary strings
(wBig is wSmall) is true because both sides are strings
(x is y) is true because both sides are strings
(j is k) is true because both sides are strings
One could argue that 'is'/'is not' should never have been overloaded to
do binary string comparison - and that should have perhaps been added as
a separate operator (especially since binary strings are compared as
numbers if numeric). With hindsight I'd probably agree as it is a slight
discontinuity in terms of comparison with pre-7.
Indeed, had we not added that overload then we would not be having this
discussion - it would have been a similar discussion as used to come up
a lot with comparing the output of compress() and other functions which
have always produced binary data - and why comparisons seemed 'not as
one would expect'.
Warmest Regards,
Mark.
--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode