I think the idea being considered at the outset was not so complex as these (and indeed, the point of the character was to avoid making these kinds of decisions). There was a desire for some reason to be able to chop up a string into equal-length pieces or something, and some of those divisions might wind up between bases and diacritics or who knows where else.  Rather than have to work out acceptable places to place the characters, the request was for a no-op character that could safely be plopped *anywhere*, even in the middle of combinations like that.

~mark

On 6/23/19 4:24 AM, Richard Wordingham via Unicode wrote:
On Sat, 22 Jun 2019 23:56:50 +0000
Shawn Steele via Unicode <unicode@unicode.org> wrote:

+ the list.  For some reason the list's reply header is confusing.

From: Shawn Steele
Sent: Saturday, June 22, 2019 4:55 PM
To: Sławomir Osipiuk <sosip...@gmail.com>
Subject: RE: Unicode "no-op" Character?

The original comment about putting it between the base character and
the combining diacritic seems peculiar.  I'm having a hard time
visualizing how that kind of markup could be interesting?
There are a number of possible interesting scenarios:

1) Chopping the string into user perceived characters.  For example,
the Khmer sequences of COENG plus letter are named sequences.  Akin to
this is identifying resting places for a simple cursor, e.g. allowing it
to be positioned between a base character and a spacing, unreordered
subscript.  (This last possibility overlaps with rendering.)

2) Chopping the string into collating elements.  (This can require
renormalisation, and may raise a rendering issue with HarfBuzz, where
renomalisation is required to get marks into a suitable order for
shaping.  I suspect no-op characters would disrupt this
renormalisation; CGJ may legitimately be used to affect rendering this
way, even though it is supposed to have no other effect* on rendering.)

3) Chopping the string into default grapheme clusters.  That
separates a coeng from the following character with which it
interacts.

*Is a Unicode-compliant *renderer* allowed to distinguish diaeresis
from the umlaut mark?

Richard.


Reply via email to