Philippe Verdy scripsit: > And I disagree with you about the fact the U+0000 can't be used in XML > documents. It can be used in URI through URI escaping mechanism, as > explicitly indicated in the XML specification...
You have a hold of the right stick but at the wrong end. U+0000 can be encoded in a URI as %00, but that does not mean that the IRIs in system ids and namespace names (and potentially other places) can contain explicit U+0000 characters or � escapes either. Both of those are illegal, and documents that contain them are not well-formed. In character content and attribute values, U+0000 is not possible. > And the fact that the various character productions, that are normally > normative, have been changed so often, sometimes through erratas that > were forgotten in the text of the next edition of the standard, Do you have evidence for this claim? > The only thing about which I can agree is that XML will forbid surrogates > and U+FFFE and U+FFFF, but I won't say that a XML parser that does not > reject NULs or other non-characters or "disallowed" C0 controls is so > much buggy. You are of course entitled to your uninformed opinion. > But all these is also a proof that XML documents are definitely NOT > plain-text documents, so you can't use Unicode encoding rules at the > encoded XML document level, only at the finest plain-text nodes (these > are the levels that the productions in the XML standard are trying, with > more or less success, to standardize). You can't blindly do *normalization* of XML documents as if they were plain text. *Encoding* XML documents according to Unicode is of course possible and desirable. > As a consequence any process that blindly applies a plain-text > normalization to a complete XML document is bogous, because it breaks the > most basic XML conformance, i.e. the core document structure... In one extraordinarily unlikely case, yes: the appearance of a combining overlay slash following the ">" that closes a tag will damage the document if it is NFC-normalized. -- You are a child of the universe no less John Cowan than the trees and all other acyclic http://www.reutershealth.com graphs; you have a right to be here. http://www.ccil.org/~cowan --DeXiderata by Sean McGrath [EMAIL PROTECTED]