On Wednesday, August 06, 2003 12:36 PM, Kent Karlsson <[EMAIL PROTECTED]> wrote:
> > The NFD decompositions of spacing marks is alredy defined as a SPACE > > plus a non-spacing combining character. > > Philippe, please! Those are *compatibility* decompositions. The > normal form NFD only uses *canonical* decompositions. And there is no > such thing as "NFD decompositions". Sorry for the confusion. Still even with a NFKD decomposition, it is clear that they already define combining sequences with the SPACE used as a base character. The real important thing is that the SPACE is already the base character already used as a combining mark holder, and Unicode processing should only be done without breaking in the middle of a combining sequence even in the case of a SPACE base character. It's true that not all (only most) combining non-spacing characters have a non-combining spacing counterpart. But when they exist, the decompositions proposed in the UCD are already an indication that the SPACE character should be preserved and not considered for break oppotunities if it is followed by a combining character. It is not extremely clear in the specification break properties where sequences of spaces are often unified, but there's already some rules that make it clear: a SPACE is a word separator only if not used in a combining sequence, and break opportunities are computed between grapheme clusters which cannot break a combining sequence. OK there's a problem with HTML, where sequences of whitespaces are normalized to a single whitespace, and this effectively creates a problem if a combining character is used after two spaces: the first one being a word separator or indenting space, the second being a base for the combining sequence. For now, most text can be created using spacing diacritics instead of combining sequences starting by SPACE, and this will work in HTML. For those diacritics which do not have a spacing counterpart already defined, there remains a problem which can only be solved using a separating format control between the first (separating) space and the second (base) space. I think this could be a ZWSP like this: ...<SPACE>, <ZWSP>, <SPACE, COMBINING-ACUTE-ACCENT>... ??? (provided that the whitespace normalization algorithm will not include <ZWSP> in the whitespaces sequence and treat it isolately, something that a conforming HTML or XML processor should not do, as it should unify only sequences of <SPACE>, <TAB>, <CR>, <LF>, and only according to the context of the containing element whitespace properties controlling the normalization of XML whitespace sequences (leading, trailing, line break preservation, tabulator)... I did no verify completely in XSLT but this should be true too there for this kind of processing (hoping that ZWSP will not be considered in whitespace sequences) -- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.