Sam, your argument appears to be "I want to handle everything as bytes without 
doing any string decoding, so any other option would be more effort (less 
efficient) for me."

XML is defined as a sequence of characters, not bytes - those characters 
subsequently need to be transformed into bytes for the purpose of 
storage/transmission, and that's defined by the encoding scheme (UTF-8 in this 
case.) Bytes is convenient for you, but not for everyone else using a language 
that does the decoding upfront. The decoding _should_ be done upfront - that's 
how you get a valid XML document.

If you're trying to handle XML without first decoding from UTF-8 so you can 
save a few clock-cycles, that's cool, but you are going to run into awkward 
annoyances when it comes to trying to handle such alien concepts as characters. 
The reason you can mostly get away with not decoding is because the lower half 
of ASCII is represented the same way when using UTF-8, so you can pretend the 
XML tags are encoded as ASCII characters and just treat any Unicode strings as 
opaque binary blobs - but that is only a convenient hack. If everyone else is 
to go along with your convenient hack, that only means they will have to deal 
with their own awkward annoyances because they made the terrible decision to 
decode strings before handling them (as if that's what you're actually supposed 
to do.)

_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
_______________________________________________

Reply via email to