On 06/08/2003 03:54, Philippe Verdy wrote:

On Wednesday, August 06, 2003 1:59 AM, Curtis Clark <[EMAIL PROTECTED]> wrote:



on 2003-08-05 15:31 Peter Kirk wrote:


Thank you, Mark. This helps to clarify things, but still doesn't
explicitly answer my question of how to encode "a sentence like "In
this language the diacritic ^ may appear above the letters ...",
but instead of ^ I want to use a combining character" and want to
display exactly one space before the combining character - do I
encode two spaces or one?


In this language the diacritic ̊ may appear above the letters...

Two spaces, at least in Thunderbird Mail.



The NFD decompositions of spacing marks is alredy defined as a SPACE plus a non-spacing combining character. ...

Really? It looks to me as if U+00B4 and U+02D8 to U+02DD have only a compatibility equivalences to space plus diacritic, and U+005E and U+0060 don't even have compatibility equivalences.

... This means that an algorithm like normalization of whitespace sequences
in XML or HTML should not include SPACEs that are used as base
characters in a combining sequence, and so it should keep two spaces
if the intent is to encode a logical space followed by a logical spacing
diacritic. (This is not a problem for XML which processes strings in their
NFC form).




It is, because there are very many combining marks which do not have spacing equivalents (even for compatibility), and so with these the NFC form will certainly be space plus diacritic.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/





Reply via email to