Re: A sign/abbreviation for "magister"

Philippe Verdy via Unicode Sat, 03 Nov 2018 15:38:55 -0700

>
> Unlike NFKC and NFKD, the NFLC and NFLD would be an extensible superset
> based on MUTABLE character properties (this can also be "decompositions
> mappings" except that once a character is added to the new property file,
> they won't be removed, and can have some stability as well, where the
> decision to "deprecate" old encodings can only be done if there's a new
> recommandation, and that if ever this recommandation changes and is
> deprecated, the previous "legacy decomposition mappings" can still be
> decomposed again to the new decompositions recommanded): unlike NFKC, and
> NFKD, a "legacy decomposition" is not "final" in all future versions, and a
> future version may remap them by just adding new entries for the new
> characters considered to be "legacy" and no longer recommended. This new
> properties file would allow evolution and adaptation to humane languages,
> and will allow correcting past errors in the standard. This file should
> have this form:
>
>   # deprecated codepoint(s) ; new preferred sequence ; Unicode version in
> which it was deprecated
>   101234 ; 101230 0300... ; 10.0
>
> This file can also be used to deprecate some old variation sequences, or
> some old clusters made of multiple characters that are isolately not
> deprecated.
>


Another note:

- this new decomposition mapping file for NFLC and NFLD, where NFLC is
defined to be NFC(NFLD), has some stability requirements and it must be
warrantied that NFD(NFLD) = NFD: the "legacy mapping forms" must be a
conforming process respecting the canonical equivalences:

- Unlike in the main UCD file for canonical decompositions, the
decompositions listed there are not limited to map one character to one or
two characters.

- The first column should be given in NFC form; the NFD form may also be
used, this does not change the result. It is NOT required that the 1st
column is in NFKC or NFKD forms (so the decompositions previously
recommanded by a "compatibility mapping" in the main UCD can be ignored: it
was just a suggestion and a requirement only for NFKC and NFKD). This
allows NFLC and NFLD to correct past errors in the frozen permanently NFKC
and NFKD decompositions.

- the mapping done here is permanent but versioned (by the first version of
Unicode deprecating a character or sequence). Being permanent means that
the deprecation cannot be removed, but it can still be changed if the
target string (preferably listed in NFC form) contains some newly
deprecated characters (that will be added separately.

- if the target of the mapping contains other deprecated characters or
sequences (added to the same file), the decompositions listed there becomes
recursive: a derived datafile can be produced listing only the new
recommended mappings.

- if a source string "SATB" is canonically equivalent to "SBTA", and "SA"
is listed as a legacy sequence mapped to be replaced by "X" in this file,
then the NFLD process will not just decompose "SATB" into NFD("XTB"), but
will also decompose "SBTA" into NBT("XBT").

- if a source string "SATB" is NOT canonically equivalent to "SBTA", and
"SA" is listed as a legacy sequence mapped to be replaced by "X" in this
file, then the NFLD process will not decompose "SATB" into NFD("XTB"), but
will not automatically decompose "SBTA" into NBT("XBT")

Then the CLDR project can use NFL(C/D) as a better source for deriving
collation elements (in the DUCET or root locale) instead of NFK(C/D) which
will follow the new recommandations and will correctly adapt the collation
orders for legacy encodings. Tailored collations (per-locale) are not
required to use compatibility mappings in the main UCD file, or in this
file, they'll use it only if they are based on the DUCET or the collation
order of the "root" locale. For that purpose, tailored collations may
specify an alternate set of "compatibility or legacy mappings" (to apply
after NFC or NFD normalization which is still required).

May be the CLDR projects would like to have these derived collation
elements to be orderable (so that it can infer and order the new relative
weights needed for ordering strings containing "legacy characters") but it
may require another column in the legacy mappings datafile (in my opinion
the "Unicode version" field already offers by default a suitable relative
ordering)

Re: A sign/abbreviation for "magister"

Reply via email to