From: "Jony Rosenne" <[EMAIL PROTECTED]>
Peter Kirk
You mean, you would represent a black e with a red acute accent as
something like "e", ZWJ, "<red>", IBC, acute, "</red>"? That
looks like
a nightmare for all kinds of processing and a nightmare for rendering.

No, it is more like <forecolor:black, combiningcolor:red> "e" "acute" And there is no Unicode decision against it.

And still no decision if this invisible base character will be added or not. It's just a public review for now, to address the first issue of rendering isolated non-spacing combining marks that currently don't have a spacing variant (I think it's a good idea as it would avoid adding most of the missing ones, notably for the non-generic L/G/C combining marks).


Note that your suggestion of:
<forecolor:black, combiningcolor:red> "e" "acute"
should also work with any normalized form of the same text, i.e. with:
<forecolor:black, combiningcolor:red> "e with acute"
where the combining mark is composed. The issue here is that this becomes tricky for renderers that will need to redecompose strings in normalized forms, before applying style.
Basically I prefer the Peter solution with:
"e", ZWJ?, "<red>", IBC, acute, "</red>"
which is more independant of the normalization form. Then the question is whever the text within <red>...</red> markup should combine visually when rendered.


For now I see the proposed IBC (no name for it for now) only as a way to transform non-spacing combining marks in spacing non-combining variants, when they dont exist separately in Unicode (so this would not be recommanded for the non-spacing acute accent which already has a spacing version that does not require using a leading IBC.)
Technically, if an IBC character is added, a renderer will not necessarily render <IBC, non-spacing combining acute> the same way as <spacing non-combining acute accent>, even if it should better do so.
In this past sentence, the "should" means that the existing spacing non-combining marks are left as the standard legacy way to encode them, and they normally don't combine when rendered after a base letter, even if there's markup around them (except if this markup explicitly says that they should combine):


If I take the above example,
"e", ZWJ?, "<red>", IBC, acute, "</red>"
the same rich-text should also be renderable without the markup in plain-text as if it was:
"e", ZWJ?, IBC, acute
i.e. (with the "should" above) like if it was also:
"e", ZWJ?, spacing acute
I have placed the "?" symbol after ZWJ to exhibit the fact that something would be necessary to allow this last text to remove the non-combining non-spacing behavior of the spacing acute character. Without it, the text:
"e", spacing acute
or equivalently (with the should above):
"e", IBC, combining acute
would not be allowed to render a combined e with an accute, and two separate glyphs would be rendered, and two separate character entities interpreted (as they are today in legacy plain-texts).


So the question remains about how to add markup on combining marks: the proposed IBC alone cannot solve such problems, unless there's an agreement that ZWJ immediately followed by IBC should be rendered as if they were not present (but in that case, a spacing acute becomes semantically and graphically distinct from <IBC, combining acute>: this is what will happen in any case with normalization forms due to the Unicode stability policy, as existing spacing marks must remain undecomposable in NFD or NFKD forms).

I also note that IBC is intended to replace the need to use a standard SPACE as the base character for building a spacing variant of combining marks when there's no standard spacing variant encoded in Unicode (this is a legacy hack, which causes various problems because of whitespace normalization in many plain-text formats or applications, or in XML and HTML, and the special word-breaking behavior of spaces). I don't see it as a way to deprecate the existing block of spacing marks.




Reply via email to