get the sourcecode [of UTF-8] (reflections on)

A bughunter via Unicode Fri, 08 Nov 2024 06:54:28 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Originating concise yet full though simple one line relevent ontopic question:


Where to get the sourcecode of relevent (version) UTF-8?: in order to checksum 
text against the specific encoding map (codepage).

I have already asked and stated what I need absolutely perfectly. I'm trying to 
figure out if there is a way to dumb it down or say it in a not so perfect way 
that ye will grasp better. 

Sourcecode does not need to be a launguage like C for a compiler like GCC. 
Sourcecode is source code. It is the code of the source whatever it may be. In 
this query was asked of UTF-8

Then I had unrolled this into "bytecode to glyph map"  no problem all perfect 
an absolutely perfect question. I don't reckon there is any way to dumb it down 
without it not being this question. 

To help you guys out maybe you just need to learn english I have posted An 
Advanced English Grammar on my GitHub here 
https://github.com/freedom-foundation/An_Advanced_English_Grammar

Let me import from the consortium glossary. This is the only term which I had 
not used but is synonymous with my having said "unicode" because this is the 
code defined by the standard the term for the bytecode which does hold the 
integer is "unicode" the kind of code in those bytes is unicode. If ye had 
spoken English well enough this should go without saying. Therefore a codepoint 
is essence of "unicode" (bytecode). Now the following are from the consortium 
and not my own definitions.

Code Point. (1) Any value in the Unicode codespace; that is, the range of 
integers from 0 to 10FFFF16. (See definition D10 in Section 3.4, Characters and 
Encoding.) Not all code points are assigned to encoded characters. See code 
point type. (2) A value, or position, for a character, in any coded character 
set.

Character. (1) The smallest component of written language that has semantic 
value; refers to the abstract meaning and/or shape, rather than a specific 
shape (see also glyph), though in code tables some form of visual 
representation is essential for the reader’s understanding. (2) Synonym for 
abstract character. (3) The basic unit of encoding for the Unicode character 
encoding. (4) The English name for the ideographic written elements of Chinese 
origin. [See ideograph (2).]

Glyph. (1) An abstract form that represents one or more glyph images. (2) A 
synonym for glyph image. In displaying Unicode character data, one or more 
glyphs may be selected to depict a particular character. These glyphs are 
selected by a rendering engine during composition and layout processing. (See 
also character.)

Unicode. (1) The standard for digital representation of the characters used in 
writing all of the world's languages. Unicode provides a uniform means for 
storing, searching, and interchanging text in any language. It is used by all 
modern computers and is the foundation for processing text on the Internet. 
Unicode is developed and maintained by the Unicode Consortium: 
https://www.unicode.org. (2) A label applied to software internationalization 
and localization standards developed and maintained by the Unicode Consortium

Now I was using Unicode Text Format which may be in use elsewhere as you see 
ISO makes claims on Unicode and so may have Microsoft or anybody. In the 
context in which I had used it this is no problem.

UTF-8. A multibyte encoding for text that represents each Unicode character 
with 1 to 4 bytes, and which is backward-compatible with ASCII. UTF-8 is the 
predominant form of Unicode in web pages. More technically: (1) The UTF-8 
encoding form. (2) The UTF-8 encoding scheme. (3) “UCS Transformation Format 
8,” defined in Annex D of ISO/IEC 10646:2003, technically equivalent to the 
definitions in the Unicode Standard.

UTF-8 Encoding Form. The Unicode encoding form that assigns each Unicode scalar 
value to an unsigned byte sequence of one to four bytes in length, as specified 
in Table 3-6, "UTF-8 Bit Distribution." (See definition D92 in Section 3.9, 
Unicode Encoding Forms.)

UCS. Acronym for Universal Character Set, which is specified by International 
Standard ISO/IEC 10646, which is equivalent in repertoire to the Unicode 
Standard.

No problems here on my side.

-----BEGIN PGP SIGNATURE-----
Version: ProtonMail

wnUEARYKACcFgmcuJCkJkKkWZTlQrvKZFiEEZlQIBcAycZ2lO9z2qRZlOVCu
8pkAABJqAP0QuHwFKWK844fEoITf0NZ4B127eLtA4U+HRkcv7z7rCgD/defN
eF4YRTr+NLA1mcPA7p/KUTSYqMrqwr6ff6JbuQg=
=rgWY
-----END PGP SIGNATURE-----

publickey - [email protected] - 0x66540805.asc
Description: application/pgp-keys

publickey - [email protected] - 0x66540805.asc.sig
Description: PGP signature

get the sourcecode [of UTF-8] (reflections on)

Reply via email to