There's really no inherent need for many spacing combining marks to have a base character. At least the ones that do not reorder and that don't overhang the base character's glyph.

As far as I canĀ  tell, it's largely a convention that originally helped identify clusters and other lack of break opportunities. But now that we have separate properties for segmentation, it's not strictly necessary to overload the combining property for that purpose.

In you example, why do you need the ZWJ and dotted circle?

Originally, just applying a combining mark to a NBSP should normally show the mark by itself. If a font insists on inserting a dotted circle glyph, that's not required from a conformance perspective - just something that's seen as helpful (to most users).

A./

On 7/21/2019 4:03 PM, Richard Wordingham via Unicode wrote:
I've been transcribing some Pali text written on palm leaf in the
Tai Tham script.  I'm looking for a way of reflecting the line
boundaries in a manuscript in a transcription.  The problem is that
lines sometimes start or end with an isolated spacing mark.  I want
my text to be searchable and therefore encoded in Unicode.  (I
appreciate that There is a trade-off between searchability and showing
line boundaries.  The unorthodox spelling is also a problem.)

How unreasonable is it for a font to render

<NBSP, ZWJ, U+25CC DOTTED CIRCLE, spacing_mark>

as just the spacing mark?  Some rendering systems give the font no way
of distinguishing dotted circles in the backing store from dotted
circles added by the renderer, so this technique is not Unicode
compliant.

An alternative solution is to have a parallel font (or, more neatly, a
feature) that renders some base character (or sequence) as a zero-width
non-inking character.  This, however, would violate that character's
identity.  I suspect there is no Unicode-compliant solution.

Richard.


Reply via email to