[ 
https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606610#action_12606610
 ] 

Lars Kotthoff commented on SOLR-443:
------------------------------------

I'm also using tomcat 5.5.26 here, but I can't reproduce that behaviour. I've 
tested on two different machines, but my tomcat always assumes that the POST 
body is url-encoded ISO-8859-1; that is, when I use the current SVN version, it 
only works for ascii characters (encoding is the same in ISO-8859-1 and UTF-8). 
If I remove the line that sets the encoding of the POST body to UTF-8, it works 
for all ISO-8859-1 characters, as httpclient encodes to ISO-8859-1 by default.

I'm very much in favour of a solution which works because all encodings are 
specified in the proper places as opposed to something that just happens to 
work with a "standard" configuration, but is not covered by any internet 
standard. This would be a timebomb just waiting to go off when somebody 
switches servlet container versions/configurations.

Worse still, this problem is likely to affect people who are just using and not 
writing their own code for Solr and don't know anything about the internals 
(cf. SOLR-303). And they aren't going to get an error message telling them that 
the character encoding is wrong, but a NullPointerException from the bowels of 
the faceting code.

The overhead from using multi-part requests may be considerable, but I don't 
think that network I/O and processing of network messages is likely to become a 
bottleneck in typical Solr applications.

> POST queries don't declare its charset
> --------------------------------------
>
>                 Key: SOLR-443
>                 URL: https://issues.apache.org/jira/browse/SOLR-443
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 1.2
>         Environment: Tomcat 6.0.14
>            Reporter: Andrew Schurman
>            Priority: Minor
>         Attachments: SOLR-443-multipart.patch, solr-443.patch, 
> solr-443.patch, SolrDispatchFilter.patch
>
>
> When sending a query via POST, the content-type is not set. The content 
> charset for the POST parameters are set, but this only appears to be used for 
> creating the Content-Length header in the commons library. Since a query is 
> encoded in UTF-8, the http headers should also specify content type charset.
> On Tomcat, this causes problems when the query string contains non-ascii 
> characters (characters with accents and such) as it tries to parse the POST 
> body in its default ISO-9886-1. There appears to be no way to set/change the 
> default encoding for a message body on Tomcat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to