On 2018-11-13 01:06, Geoff Canyon via use-livecode wrote:
On Mon, Nov 12, 2018 at 11:36 AM Ben Rubinstein via use-livecode <
use-livecode@lists.runrev.com> wrote:

I'm really confused that case-insensitive should work at all for UTF-16 or
UTF-32;

The caseSensitive (and formSensitive) properties only apply to strings *not* binary strings.

The output of textEncode() is a binary string.

The 'is' operator is overloaded - in strict order:

  left-empty 'is' right-ANY -- returns is-empty(right-ANY)
  left-ANY 'is' right-empty -- returns is-empty(left-ANY)
  left-array 'is' left-array -- compare as array
  left-number 'is' right-number -- compare as number
left-numeric-[binary]-string 'is' right-numeric-[binary]-string -- compare as number left-binary-string 'is' right-binary-string -- compare as binary strings
  left-any 'is' right-any -- compare as strings

Also concatenation, put after and put before are overloaded:

   binary-string & binary-string -> binary-string
   string & ANY -> string
   ANY & string -> string

   put src-data after|before dst-data -> dst-data is binary-string
   put src-ANY after|before dst-ANY -> dst-ANY is string

This is so puzzling. I tried this code in a button:

on mouseUp
   put "Ѡ" into x
   put "ѡ" into y
   --put ("Ѡ" is "ѡ") && (x is y)
   --exit mouseUp
   put textencode("Ѡ","UTF-32") into xBig
   put textencode("ѡ","UTF-32") into xSmall
   repeat for each byte B in xBig
      put B after yBig
   end repeat
   repeat for each byte B in xSmall
      put B after ySmall
   end repeat
   put "Ѡ" into zBig
   put "ѡ" into zSmall
   put zBig into wBig
   put zSamll into wSmall
   put textencode(zBig,"UTF-32") into zBig
   put textencode(zSmall,"UTF-32") into zSmall
   put x into j
   put y into k
   set caseSensitive to false
   put ("Ѡ" is "ѡ") && (xBig is xSmall) && (yBig is ySmall) && (zBig is
zSmall) && (wBig is wSmall) && (x is y) && (j is k)
end mouseUp


That puts: true false false false true true true

Things to note:

1. "Ѡ" and "ѡ" are upper and lower case omega in cyrillic, 00000460 and
00000461. Given the string literals, LC is happy to say they are the same
(the first true)
2. Put them in a variable, LC is happy to say they are the same
(the second-to-last true).
3. Convert them to UTF-32 and LC no longer recognizes them as the same (the
fourth boolean, false)
4. Put the variables into other variables, and LC identifies them as the
same (the last true)

("Ѡ" is "ѡ") is true because they are both strings
(xBig is xSmall) is false because both sides are binary-strings (and so compare byte for byte)
(yBig is ySmall) is false because both sides are binary-strings
(zBig is zSmall) is false because you've textEncoded strings which produce binary-strings so both are binary strings
(wBig is wSmall) is true because both sides are strings
(x is y) is true because both sides are strings
(j is k) is true because both sides are strings

One could argue that 'is'/'is not' should never have been overloaded to do binary string comparison - and that should have perhaps been added as a separate operator (especially since binary strings are compared as numbers if numeric). With hindsight I'd probably agree as it is a slight discontinuity in terms of comparison with pre-7.

Indeed, had we not added that overload then we would not be having this discussion - it would have been a similar discussion as used to come up a lot with comparing the output of compress() and other functions which have always produced binary data - and why comparisons seemed 'not as one would expect'.

Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to