It appears that libxslt1.1 pays attention to the charset declaration in the
Content-Type HTTP header when retrieving XML files with MIME types of
application/xml or text/xml via the document() function. If a misconfigured
web server sends "Content-Type: text/xml; charset=iso-8859-15" but the XML
file itself has no encoding declaration in the XML prolog (and is thus to be
taken as UTF-8), libxslt treats the incoming file as ISO-8859-15 and so
mangles byte sequences that express e.g. many common vowels with diacritics.
libxslt does not exhibit the behavior when the MIME type is 'text/html'.
Saxon 6.5.5 does not exhibit the same behavior with any MIME type/charset
combination.
I am attaching a test stylesheet that takes itself as input, and retrieves a
simple file in UTF-8 and Latin-9 encodings from a webserver, and outputs the
results with MIME types and charsets noted. I have confirmed the bug in
libxslt 1.1.24--would anyone care to check it in more recent versions before I
log the bug?
Thanks,
Chuck
--
Chuck Bearden ([email protected] ; 713.348.3661)
XML Engineer, Connexions
http://cnx.org/
test.xsl
Description: XML document
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml