Peter Kirk said:

> Tell Microsoft! (See Noah Levitt's posting.)

Indeed.

> 
> If this is indeed "The standard way to do what you want", then the 
> standard needs to make it clear that the sequence of <space, combining 
> mark> or <NBSP, combining mark> has the properties which I want, i.e. it 
> has the width of the combining mark alone, and not the full width of a 
> space, 

This is up to the implementation and the font, and is not something
that the Unicode Standard should mandate, IMO. This steps over the
bound of the plain text content.

> and does not expand for justification,

This is likewise an issue for the implementation. The Unicode Standard
does not mandate how a typographic implementation must implement
interword, intercharacter, or any other kind of justification.

> is not a line breaking 
> opportunity, 

This, however, *is* specified. See UAX #14, in the section discussing
CM (the line break class associated with combining marks):

"If U+0020 SPACE is used as a base character, it is treated as
AL instead of SP."

What that means is that rather than sifting down through the line
break rule determinations according to a lb=SP category, it is
then handled as lb=AL, which puts it in the same class with
ordinary letters for the purposes of determining a line break
opportunity.

Of course, a conformant Unicode implementation is not *required*
to implement line-breaking as specified in UAX #14. But if it
claims it is doing so, and does not handle SP+combining_mark
combinations this way, then it is a nonconformant implementation
of line-breaking.

> does not in fact have any of the properties of a space.

It does, in fact, have some of the properties of a space, since
it is U+0020 SPACE, after all. But the important fact is that
implementations are supposed to be implementing the semantics
of the combining character sequence taking the SPACE as the base
and any following *non*-spacing combining mark as applied to
that base. If the implementations then result in inappropriate
rendering or line-breaking for that sequence, that is, as Kent
said, an issue to take up with the implementers.

> I 
> expect to see such a clarification in the next edition of the Unicode 
> Standard.

See above for the reasons why it is unlikely to be any more
constrained by the standard than it already is.

A point I keep trying to make, but which often gets overlooked
by people trying to code Unicode mechanisms for dealing with
edge cases, is that the design goal of the Unicode Standard is,
and always has been, to represent *plain text content*. It
cannot, and should not, IMO, deal with requirements for
representing arbitrarily fine distinctions of typographical
detail in all manuscripts and other documents in all writing
systems of the world.

Continuing to require that the Unicode Standard *must* specify
some inherent mechanism for indicating the display width of
combining character sequences clearly steps over the bounds
of what is required to represent plain text content.

--Ken




Reply via email to