Re: Unicode String Models

Eli Zaretskii via Unicode Tue, 11 Sep 2018 10:25:44 -0700

> From: Hans Åberg <[email protected]>
> Date: Tue, 11 Sep 2018 19:13:28 +0200
> Cc: Henri Sivonen <[email protected]>,
>  [email protected]
> 
> > In Emacs, each raw byte belonging
> > to a byte sequence which is invalid under UTF-8 is represented as a
> > special multibyte sequence.  IOW, Emacs's internal representation
> > extends UTF-8 with multibyte sequences it uses to represent raw bytes.
> > This allows mixing stray bytes and valid text in the same buffer,
> > without risking lossy conversions (such as those one gets under model
> > 2 above).
> 
> Can you give a reference detailing this format?


There's no formal description as English text, if that's what you
meant.  The comments, macros and functions in the files
src/character.[ch] in the Emacs source tree tell most of that story,
albeit indirectly, and some additional info can be found in the
section "Text Representation" of the Emacs Lisp Reference manual.

Re: Unicode String Models

Reply via email to