Marcin asked: > The general trouble is that numeric character references can only > encode individual code points
By design. > rather than graphemes (is this a correct > term for a non-combining code point with a sequence of combining code > points?). No. The correct term is "combining character sequence". TUS 4.0, p. 70, D17. The correct NCR representation of a combining character sequence is a sequence of NCR's. -- Not too surprisingly. --Ken > So if XML is supposed to be treated as a sequence of > graphemes, weird effects arise in the above boundary cases...