On 10/08/2004 18:33, Jon Hanna wrote:

...

As for modern markup, consider if instead of ̄ you had ̸
By the rules of XML that is treated as if the character U+0338 was there rather
than the escape sequence.
By the rules of Unicode the sequence U+003E, U+0338 is treated the same as the
character U+226F.
By the rules of XML replacing ≯ with U+226F would mean the document was
no longer well-formed.

So even without an explicit spec saying otherwise the above would be
problematic.



This means that the rules of XML conflict with the rules of Unicode. If the string is a Unicode string, U+226F is canonically equivalent to <U+003E, U+0338> and therefore any higher level protocol should treat the two sequences as identical, rather than reject one of them as causing the document to be ill-formed.


-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/





Reply via email to