On 2018-01-26 18:50, Tom Glod via use-livecode wrote:
Hi Everyone,

I want to ask how likely it is that at some point in the future some change in character encoding could start producing a different hash for the same sentence? just thinking about the nightmare scenarios facing a project that heavily uses hashing to verify and address content......in international
characters......to boot.

The hash/digest functions (e.g. sha1Digest) operate on binary data. So if you do:

  put sha1Digest("foobar")

Then "foobar" is first converted to binary data using the native encoding (i.e. the backwards-compatibility rule we have), then that is hashed.

In every case where you produce a hash you have to explicitly choose an encoding - so pick you favourite (unicode friendly!) encoding and do:

  get sha1Digest(textEncode(tMyString, tMyEncoding))

If you are generating hashes of strings to send to existing things, then it should say *somewhere* in the docs of the thing you are sending what encoding to use before applying the hash.

Also be aware that unicode allows the 'same' string to be encoded in multiple ways - so its probably wise to choose a normalization form first too (see normalizeText) - otherwise you could have two strings which look the same (e.g. e,acute / e-acute) but hash to a different value.

Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to