> From: Hans Åberg <[email protected]> > Date: Tue, 11 Sep 2018 19:13:28 +0200 > Cc: Henri Sivonen <[email protected]>, > [email protected] > > > In Emacs, each raw byte belonging > > to a byte sequence which is invalid under UTF-8 is represented as a > > special multibyte sequence. IOW, Emacs's internal representation > > extends UTF-8 with multibyte sequences it uses to represent raw bytes. > > This allows mixing stray bytes and valid text in the same buffer, > > without risking lossy conversions (such as those one gets under model > > 2 above). > > Can you give a reference detailing this format?
There's no formal description as English text, if that's what you meant. The comments, macros and functions in the files src/character.[ch] in the Emacs source tree tell most of that story, albeit indirectly, and some additional info can be found in the section "Text Representation" of the Emacs Lisp Reference manual.

