[ 
https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607006#action_12607006
 ] 

Yonik Seeley commented on SOLR-443:
-----------------------------------

bq. I can confirm that setting the content type manually to 
"application/x-www-form-urlencoded; charset=UTF-8" works, but that seems like a 
dirty hack to me. There's no standard/specification/.. covering that.

I agree it's a bit hackish... but that's the state of things.  I'm more 
concerned if it actually works everywhere (and I was surprised that it seems 
to).  I imagine in the future, UTF-8 will be the standard... there's no getting 
around it unless one want's to just ban x-www-form-urlencoded POST for 
non-ascii, and that doesn't seem reasonable.

I started using POST because the queries could go over the size limits of GET 
(so that's yet another hack).  Using multi-part would really blow up the size 
of these requests, and could actually become a bottleneck when the number of 
servers is high.

> POST queries don't declare its charset
> --------------------------------------
>
>                 Key: SOLR-443
>                 URL: https://issues.apache.org/jira/browse/SOLR-443
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 1.2
>         Environment: Tomcat 6.0.14
>            Reporter: Andrew Schurman
>            Priority: Minor
>         Attachments: SOLR-443-multipart.patch, solr-443.patch, 
> solr-443.patch, SolrDispatchFilter.patch
>
>
> When sending a query via POST, the content-type is not set. The content 
> charset for the POST parameters are set, but this only appears to be used for 
> creating the Content-Length header in the commons library. Since a query is 
> encoded in UTF-8, the http headers should also specify content type charset.
> On Tomcat, this causes problems when the query string contains non-ascii 
> characters (characters with accents and such) as it tries to parse the POST 
> body in its default ISO-9886-1. There appears to be no way to set/change the 
> default encoding for a message body on Tomcat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to