On Monday, 04 November 2024, 00:43:29 (-05:00), A bughunter via Unicode wrote:
>
> No, it does not answer my question.
I don't think I'm alone in saying that your question is very unclear, in major
part by your very strange use of certain terms. I don't think I've ever
encountered "bytecode" outside of Java implementations, and never does it refer
to textual (prose) data as you seem to do. I still don't know what "compile
time UTF-8" is supposed to be, and I've read both your messages multiple times.
> In order to fully authenticate: the codepage of the character to glyph map
> must be known.
To authenticate what? At the end of the day, you're always just authenticating
a stream of bits.
> I need the bytecode to glyph map of UTF-8 as it is used by my runtime
> software.
You want to map UTF8-encoded code points to characters? (Glyphs are the visual
representations of characters, determined by the font.) In that case the "map"
is the Unicode database. Each code point (encoded as one or more bytes in
UTF8) maps to a character. Versions of the database are freely accessible
online.
But I am still very unsure of what you're asking for. Are you concerned that
code points may be reassigned in the future? That, for example, writing "Smith"
in version 16 may appear as "Smite" in a future version, and this affects the
apparent content of a checksummed text file? If so, that is prevented by the
Unicode Stability Policy; assigned code points cannot have their character
identity changed. I don't see any practical way of exploiting differences
between Unicode versions to alter the apparent content of text.
If you wish to checksum a text file encoded in UTF-8, any implementation of a
well-defined checksum algorithm will work. Your runtime doesn't matter. The
checksum will be on the bytes of the file. If you must know what version of the
Unicode Standard was used when creating the file -- and that's a strange thing
to want -- that would have to be included in the file prior to checksumming it.
That said, I remain confused how the "source code" of anything is supposed to
help you.
Sławomir