Philippe Verdy scripsit:

> And I disagree with you about the fact the U+0000 can't be used in XML 
> documents. It can be used in URI through URI escaping mechanism, as 
> explicitly indicated in the XML specification...

You have a hold of the right stick but at the wrong end.  U+0000 can be
encoded in a URI as %00, but that does not mean that the IRIs in system ids
and namespace names (and potentially other places) can contain explicit
U+0000 characters or � escapes either.  Both of those are illegal,
and documents that contain them are not well-formed.

In character content and attribute values, U+0000 is not possible.

> And the fact that the various character productions, that are normally 
> normative, have been changed so often, sometimes through erratas that 
> were forgotten in the text of the next edition of the standard,  

Do you have evidence for this claim?

> The only thing about which I can agree is that XML will forbid surrogates 
> and U+FFFE and U+FFFF, but I won't say that a XML parser that does not 
> reject NULs or other non-characters or "disallowed" C0 controls is so 
> much buggy. 

You are of course entitled to your uninformed opinion.

> But all these is also a proof that XML documents are definitely NOT 
> plain-text documents, so you can't use Unicode encoding rules at the 
> encoded XML document level, only at the finest plain-text nodes (these 
> are the levels that the productions in the XML standard are trying, with 
> more or less success, to standardize).

You can't blindly do *normalization* of XML documents as if they were
plain text.  *Encoding* XML documents according to Unicode is of course
possible and desirable.

> As a consequence any process that blindly applies a plain-text 
> normalization to a complete XML document is bogous, because it breaks the 
> most basic XML conformance, i.e. the core document structure...

In one extraordinarily unlikely case, yes: the appearance of a
combining overlay slash following the ">" that closes a tag will
damage the document if it is NFC-normalized.

-- 
You are a child of the universe no less         John Cowan
than the trees and all other acyclic            http://www.reutershealth.com
graphs; you have a right to be here.            http://www.ccil.org/~cowan
  --DeXiderata by Sean McGrath                  [EMAIL PROTECTED]

Reply via email to