On Thu, Sep 27, 2018 at 02:22:55PM +0200, Nick Wellnhofer wrote:
> On 27/09/2018 10:59, Roumen Petrov wrote:
> > Let consider case as "file" mode.
> 
> > Let consider case as "stream" code.
> 
> I'm not only talking about xmllint but the serialization API (xmlSave*,
> xmlNodeDump*) in general.
> 
> > Now about above test samples . if content is stored in file xmllint
> > works fine with encoding(=codeset=charset).
> > 
> > $ cat test-noencoding.xml
> > <?xml version="1.0"?><doc>Käse</doc>
> 
> No, it doesn't work fine:
> 
> $ xmllint test-noencoding.xml
> <?xml version="1.0"?>
> <doc>K&#xE4;se</doc>
> 
> > (2) Next a-umlaut character is encoded in hexadecimal. Minor
> > inconsistency between "stream" and "file" mode.
> 
> As shown above, "file" mode can also produce unwanted numeric character
> references.
> 
> > (3) Problem is that in "scream" mode xmllint application ignores value
> > of encode argument:
> > $ echo '<?xml version="1.0"?><doc>Käse</doc>' | xmllint - --encode UTF-8
> > <?xml version="1.0"?>
> > <doc>K&#xE4;se</doc>
> 
> Right, there is an inconsistency in xmllint. But that's not my point.
> 
> >  From my point of view (1) and (2) are minor non-important issues. Only
> > (3) could be fixed with low priority.
> 
> Unneeded numeric character references in UTF-8 output are not a minor issue.
> If you're working with non-Latin scripts, it makes serialized XML files
> unreadable for humans and blows up the file size.

  Not breaking a decade os programs who may be expecting that behaviour sounds
far more important to me honnestly.

Daniel

-- 
Daniel Veillard      | Red Hat Developers Tools http://developer.redhat.com/
veill...@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to