[ 
https://issues.apache.org/jira/browse/SOLR-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773501#action_12773501
 ] 

Age Jan Kuperus commented on SOLR-412:
--------------------------------------

IMHO the documentation in xslt 1.0 (http://www.w3.org/TR/xslt#output) is a bit 
clearer on the usage of these fields:

"The method attribute on xsl:output identifies the overall method that should 
be used for outputting the result tree. The value must be a QName. If the QName 
does not have a prefix, then it identifies a method specified in this document 
and must be one of xml, html or text."

"encoding specifies the preferred character encoding that the XSLT processor 
should use to encode sequences of characters as sequences of bytes; the value 
of the attribute should be treated case-insensitively; the value must contain 
only characters in the range #x21 to #x7E (i.e. printable ASCII characters); 
the value should either be a charset registered with the Internet Assigned 
Numbers Authority [IANA], [RFC2278] or start with X-"

"media-type specifies the media type (MIME content type) of the data that 
results from outputting the result tree; the charset parameter should not be 
specified explicitly; instead, when the top-level media type is text, a charset 
parameter should be added according to the character encoding actually used by 
the output method"

If I understand this correctly, this means the correct output specification is 
<xsl:output method="xml" encoding="utf-8" />, and <xsl:output 
media-type="text/xml; charset=UTF-8"/> should never be used. 

My suggestion would be to change XSLTResponseWriter.getContentType() in such a 
way that (in pseudocode):
if encoding is null
  encoding = "utf-8"
end if
if  media-type is not null
  /* next if is for compatibility with the workaround only */
  if media-type contains "charset='
    return media-type
  else
      return media-type + "; charset=\"" + encoding
  end if
else
  if method is "html" or the first element in the final output is <html>
    media-type = "text/html"
  elseif method is "text"
    media-type = "text/plain"
  else /* it must be xml */
    media-type = "text/xml"
  end if
  return media-type + "; charset=\"" + encoding
end if

> XsltWriter does not output UTF-8 by default
> -------------------------------------------
>
>                 Key: SOLR-412
>                 URL: https://issues.apache.org/jira/browse/SOLR-412
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.2
>         Environment: Tomcat 5.5
> Linux Red Hat ES4  (2.6.9-5.ELsmp from 'uname -a')
>            Reporter: Lance Norskog
>
> XsltWriter outputs XML text in ISO-8859-1 encoding by default.
> Tomcat 5.5 has URIEncoding="UTF-8" set in the <Connector> element as 
> described in the Wiki.
> This outout description in the XML: 
> <xsl:output method="xml" encoding="utf-8" />
> gives output with this header:
> HTTP/1.1 200 OK
> Server: Apache-Coyote/1.1
> Content-Type: text/xml;charset=ISO-8859-1
> Transfer-Encoding: chunked
> Date: Wed, 14 Nov 2007 17:49:11 GMT
> I had to change the <xsl:output> directive to this:
>  <xsl:output media-type="text/xml; charset=UTF-8" encoding="UTF-8"/>
> This is the root cause of SOLR-233.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to