For the record: On Dienstag, 8. Dezember 2020 23:13:08 CET Sam Whited wrote: > I don't understand how this is part of the XML data model. Do you mean > that only Unicode encodings are supported by XML? If so, that's fair and > removes one of my arguments, I did not know that was the case. However, > I still think the data on the wire should describe the other data on the > wire, not some higher- level "decoded" representation that many XML > libraries may not even use.
Let me dig up the references: https://www.w3.org/TR/REC-xml/#charsets > [Definition: A parsed entity contains text, a sequence of characters, which may represent markup or character data.] text = sequence of characters, representing markup or character data https://www.w3.org/TR/REC-xml/#syntax > [Definition: All text that is not markup constitutes the character data of the document.] Ok, so we have text which is a sequence of characters, and what isn’t markup is character data. Now what are characters in XML? Back to: https://www.w3.org/TR/REC-xml/#charsets > [Definition: A character is an atomic unit of text as specified by ISO/IEC 10646:2000 [ISO/IEC 10646]. Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. The versions of these standards cited in A.1 Normative References were current at the time this document was prepared. New characters may be added to these standards by amendments or new editions. Consequently, XML processors MUST accept any character in the range specified for Char. ] That is the definition of a subset of the Unicode code point range: > [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | > [#xE000- #xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ kind regards, Jonas _______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org _______________________________________________