On Tue, Sep 25, 2018 at 01:19:51PM +0200, Nick Wellnhofer wrote:
> libxml2 serializes documents without an encoding declaration differently
> than documents with an explicit UTF-8 encoding:
> 
> $ echo '<?xml version="1.0"?><doc>Käse</doc>' |xmllint -
> <?xml version="1.0"?>
> <doc>K&#xE4;se</doc>
> 
> $ echo '<?xml version="1.0" encoding="utf-8"?><doc>Käse</doc>' |xmllint -
> <?xml version="1.0" encoding="utf-8"?>
> <doc>Käse</doc>
> 
> Since the encoding should default to UTF-8, can anyone explain why this
> decision was made?

  Because using the codepoint is part of the core XML spec, there is no
way this can be screwed up when people are doing manipulations like
cutting parts of an XML document, pasting it somewhere else where the
context may be differemt. So if you don't explicitely ask for an encoding
libxml2 will deliver the most resilient serialization possible and that
means using codepoint, except where not possible (and then specifics about
attributes serialization, etc ...)
  Please keep it that way, you have no idea what people may have done
and unless this really fixes an issue I would be very reluctant to change
this behaviour.

 thanks,

Daniel

-- 
Daniel Veillard      | Red Hat Developers Tools http://developer.redhat.com/
veill...@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to