[
https://issues.apache.org/jira/browse/SOLR-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773501#action_12773501
]
Age Jan Kuperus commented on SOLR-412:
--------------------------------------
IMHO the documentation in xslt 1.0 (http://www.w3.org/TR/xslt#output) is a bit
clearer on the usage of these fields:
"The method attribute on xsl:output identifies the overall method that should
be used for outputting the result tree. The value must be a QName. If the QName
does not have a prefix, then it identifies a method specified in this document
and must be one of xml, html or text."
"encoding specifies the preferred character encoding that the XSLT processor
should use to encode sequences of characters as sequences of bytes; the value
of the attribute should be treated case-insensitively; the value must contain
only characters in the range #x21 to #x7E (i.e. printable ASCII characters);
the value should either be a charset registered with the Internet Assigned
Numbers Authority [IANA], [RFC2278] or start with X-"
"media-type specifies the media type (MIME content type) of the data that
results from outputting the result tree; the charset parameter should not be
specified explicitly; instead, when the top-level media type is text, a charset
parameter should be added according to the character encoding actually used by
the output method"
If I understand this correctly, this means the correct output specification is
<xsl:output method="xml" encoding="utf-8" />, and <xsl:output
media-type="text/xml; charset=UTF-8"/> should never be used.
My suggestion would be to change XSLTResponseWriter.getContentType() in such a
way that (in pseudocode):
if encoding is null
encoding = "utf-8"
end if
if media-type is not null
/* next if is for compatibility with the workaround only */
if media-type contains "charset='
return media-type
else
return media-type + "; charset=\"" + encoding
end if
else
if method is "html" or the first element in the final output is <html>
media-type = "text/html"
elseif method is "text"
media-type = "text/plain"
else /* it must be xml */
media-type = "text/xml"
end if
return media-type + "; charset=\"" + encoding
end if
> XsltWriter does not output UTF-8 by default
> -------------------------------------------
>
> Key: SOLR-412
> URL: https://issues.apache.org/jira/browse/SOLR-412
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 1.2
> Environment: Tomcat 5.5
> Linux Red Hat ES4 (2.6.9-5.ELsmp from 'uname -a')
> Reporter: Lance Norskog
>
> XsltWriter outputs XML text in ISO-8859-1 encoding by default.
> Tomcat 5.5 has URIEncoding="UTF-8" set in the <Connector> element as
> described in the Wiki.
> This outout description in the XML:
> <xsl:output method="xml" encoding="utf-8" />
> gives output with this header:
> HTTP/1.1 200 OK
> Server: Apache-Coyote/1.1
> Content-Type: text/xml;charset=ISO-8859-1
> Transfer-Encoding: chunked
> Date: Wed, 14 Nov 2007 17:49:11 GMT
> I had to change the <xsl:output> directive to this:
> <xsl:output media-type="text/xml; charset=UTF-8" encoding="UTF-8"/>
> This is the root cause of SOLR-233.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.