Peter Kirk said: > Tell Microsoft! (See Noah Levitt's posting.)
Indeed. > > If this is indeed "The standard way to do what you want", then the > standard needs to make it clear that the sequence of <space, combining > mark> or <NBSP, combining mark> has the properties which I want, i.e. it > has the width of the combining mark alone, and not the full width of a > space, This is up to the implementation and the font, and is not something that the Unicode Standard should mandate, IMO. This steps over the bound of the plain text content. > and does not expand for justification, This is likewise an issue for the implementation. The Unicode Standard does not mandate how a typographic implementation must implement interword, intercharacter, or any other kind of justification. > is not a line breaking > opportunity, This, however, *is* specified. See UAX #14, in the section discussing CM (the line break class associated with combining marks): "If U+0020 SPACE is used as a base character, it is treated as AL instead of SP." What that means is that rather than sifting down through the line break rule determinations according to a lb=SP category, it is then handled as lb=AL, which puts it in the same class with ordinary letters for the purposes of determining a line break opportunity. Of course, a conformant Unicode implementation is not *required* to implement line-breaking as specified in UAX #14. But if it claims it is doing so, and does not handle SP+combining_mark combinations this way, then it is a nonconformant implementation of line-breaking. > does not in fact have any of the properties of a space. It does, in fact, have some of the properties of a space, since it is U+0020 SPACE, after all. But the important fact is that implementations are supposed to be implementing the semantics of the combining character sequence taking the SPACE as the base and any following *non*-spacing combining mark as applied to that base. If the implementations then result in inappropriate rendering or line-breaking for that sequence, that is, as Kent said, an issue to take up with the implementers. > I > expect to see such a clarification in the next edition of the Unicode > Standard. See above for the reasons why it is unlikely to be any more constrained by the standard than it already is. A point I keep trying to make, but which often gets overlooked by people trying to code Unicode mechanisms for dealing with edge cases, is that the design goal of the Unicode Standard is, and always has been, to represent *plain text content*. It cannot, and should not, IMO, deal with requirements for representing arbitrarily fine distinctions of typographical detail in all manuscripts and other documents in all writing systems of the world. Continuing to require that the Unicode Standard *must* specify some inherent mechanism for indicating the display width of combining character sequences clearly steps over the bounds of what is required to represent plain text content. --Ken