Hello bughunter, before wording a question to any discussion group, it is recommended to read (and understand) the pertinent FAQ list; otherwise the ensuing discussion will focus on definitions and terms rather than the problem at hand. You may start reading at <https://www.unicode.org/faq/>.
That said, I’ll try to answer your question. As your problem is not quiet clear, you’ll get basically three answers, and a technical hint pertaining to two of them. You have asked::
Where to get the sourcecode of relevent (version) UTF-8?: in order to checksum text against the specific encoding map (codepage).
My answer depends on the purpose of the checksum. UTF-8 is one method (of a handfull of standardized methods) to represent Unicode text at the bit level in order to conveniently transfer, or store, it. If the intend of your checksum is merely to protect against transmission error, or tempering, then you would simply checksum this bit-level representation of the text – no knowledge of Unicode, or UTFs, is required to achieve this goal. A Unicode code point is a number in the range from 0 to 1 114 111; a Unicode text is a sequence of Unicode code points. On the bit level, you can represent that sequence in various ways, cf. <https://www.unicode.org/faq/utf_bom.html>. Hence, if you want to compare two Unicode texts that are represented in arbitrary bit-level representations (UTFs), then you would convert those to the same UTF (preferably UTF-32) and checksum those. (UTF-32 stores the 21 bits needed to represent a Unicode code point in one 32 bit wide storage location, leaving 11 bits unused.) In Unicode, some characters may be represented in various ways; e. g. an “é” can be coded as one single Unicode code point, viz. U+00E9, LATIN SMALL LETTER E WITH ACUTE, or, alternatively, as a pair of Unicode code points, viz. U+0065 U+0301 LATIN SMALL LETTER E + COMBINING ACUTE ACCENT. To cope with ambiguities of this kind, Unicode defines those two representations as “canonically equivalent”, i. e., they are to be treated in every respect as equivalent and interchangeable, for details, cf. <https://www.unicode.org/faq/normalization.html>. Hence, if you want to check that two Unicode texts are canonically equivalent, you would first convert them to UTF-32, then ‘normalize’ them (i. e. choose consistently the same representation for all instances of canonically equivalent encodings), then checksum the normalized representations. You were asking for source code, but the better way to do conversion and normalizations is by using an established and well-tested program library, such as ICU, cf. <https://icu.unicode.org/#h.i33fakvpjb7o>. Good luck with your project, Otto
