On 27/09/2018 10:59, Roumen Petrov wrote:
Let consider case as "file" mode.
Let consider case as "stream" code.
I'm not only talking about xmllint but the serialization API (xmlSave*,
xmlNodeDump*) in general.
Now about above test samples . if content is stored in file xmllint works fine
with encoding(=codeset=charset).
$ cat test-noencoding.xml
<?xml version="1.0"?><doc>Käse</doc>
No, it doesn't work fine:
$ xmllint test-noencoding.xml
<?xml version="1.0"?>
<doc>Käse</doc>
(2) Next a-umlaut character is encoded in hexadecimal. Minor inconsistency
between "stream" and "file" mode.
As shown above, "file" mode can also produce unwanted numeric character
references.
(3) Problem is that in "scream" mode xmllint application ignores value of
encode argument:
$ echo '<?xml version="1.0"?><doc>Käse</doc>' | xmllint - --encode UTF-8
<?xml version="1.0"?>
<doc>Käse</doc>
Right, there is an inconsistency in xmllint. But that's not my point.
From my point of view (1) and (2) are minor non-important issues. Only (3)
could be fixed with low priority.
Unneeded numeric character references in UTF-8 output are not a minor issue.
If you're working with non-Latin scripts, it makes serialized XML files
unreadable for humans and blows up the file size.
Nick
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml