On Thu, Sep 27, 2018 at 02:22:55PM +0200, Nick Wellnhofer wrote: > On 27/09/2018 10:59, Roumen Petrov wrote: > > Let consider case as "file" mode. > > > Let consider case as "stream" code. > > I'm not only talking about xmllint but the serialization API (xmlSave*, > xmlNodeDump*) in general. > > > Now about above test samples . if content is stored in file xmllint > > works fine with encoding(=codeset=charset). > > > > $ cat test-noencoding.xml > > <?xml version="1.0"?><doc>Käse</doc> > > No, it doesn't work fine: > > $ xmllint test-noencoding.xml > <?xml version="1.0"?> > <doc>Käse</doc> > > > (2) Next a-umlaut character is encoded in hexadecimal. Minor > > inconsistency between "stream" and "file" mode. > > As shown above, "file" mode can also produce unwanted numeric character > references. > > > (3) Problem is that in "scream" mode xmllint application ignores value > > of encode argument: > > $ echo '<?xml version="1.0"?><doc>Käse</doc>' | xmllint - --encode UTF-8 > > <?xml version="1.0"?> > > <doc>Käse</doc> > > Right, there is an inconsistency in xmllint. But that's not my point. > > > From my point of view (1) and (2) are minor non-important issues. Only > > (3) could be fixed with low priority. > > Unneeded numeric character references in UTF-8 output are not a minor issue. > If you're working with non-Latin scripts, it makes serialized XML files > unreadable for humans and blows up the file size.
Not breaking a decade os programs who may be expecting that behaviour sounds far more important to me honnestly. Daniel -- Daniel Veillard | Red Hat Developers Tools http://developer.redhat.com/ veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml