the
> solution with
> SPACE is really tricky due to the special treatment of SPACE notably
> in HTML, SGML, XML

I disagree. There are a few different things that happen with whitespace in
such technologies. Some of these only apply to elements that do not allow
any character data apart from whitespace to appear directly within them, and
hence are not an issue here. Some happen at relatively high level of
processing, e.g. rendering (not parsing) of HTML, and as such should
correctly process spaces combined with combining characters.

There are only two theoretical problems that I can see here, the first is
that a whitespace character other than space gets converted to space by
attribute value normalisation, and that this changes the meaning of the text
in some way. This could only occur if the combining character were the first
character in a line of text, which is quite a nonsensical construct to begin
with.

The other would be with names, qnames, nmtokens and such. These are not
normal textual content; they are human-readable constructs that are based on
normal text because that makes it easier for some developers to work at a
plain-text level (if they speak the natural language that the human-readable
constructs were based on). Support for the linguistic oddity of a dialectic
divorced from the context in which it would normally exist would have little
justification in this place except for fulfilling the general goal of
"completeness". Completeness is a laudable aim of course, but extreme
edge-cases need only be brought in if they are both safe and cheap. Anyone
designing an XML application who frequently considers isolated diacritics as
the most natural choice in part of such tokens probably needs to take a
couple of weeks holidays before continuing the design. Of course some of the
characters that could be considered to be precomposed isolated diacritics
are banned from use in nmtokens anyway.


Reply via email to