[ https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605868#action_12605868 ]
Lars Kotthoff commented on SOLR-303: ------------------------------------ Yonik, thanks for taking a look at it. I've investigated this issue further and I believe I know what the root cause is now. The line {code:title=o.a.s.client.solrj.impl.CommonsHttpSolrServer.java} ... post.getParams().setContentCharset("UTF-8"); ... {code} tells the *sender* to encode the data as UTF-8. The way the *receiver* decodes the data depends on whatever is set as charset in the Content-Type header. This header is currently automatically added by httpclient and, as you can see in the netcat log, "application/x-www-form-urlencoded", i.e. without a charset. The default charset is ISO-8859-1 (cf. [http://hc.apache.org/httpclient-3.x/charencodings.html]). So the data is *encoded* as UTF-8 but *decoded* as ISO-8859-1, which causes the effect I described earlier. I tried to reproduce this with TestDistributedSearch myself, but for some reason it seems to be fine. Perhaps the Jetty configuration is different to my Tomcat configuration. I didn't find any parameter to tell Tomcat the default encoding if the Content-Type header doesn't specify one though. The minimal change I had to make to make it work was add a line to set the Content-Type header explicitly, i.e. {code:title=o.a.s.client.solrj.impl.CommonsHttpSolrServer.java} ... post.getParams().setContentCharset("UTF-8"); post.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"); ... {code} This probably won't work with multi-part requests though. I'm not sure what the right way to handle this would be. The stub Content-Type header is set by httpclient when the method is executed, i.e. there's no way to let httpclient figure out the first part and then append the charset in CommonsHttpSolrServer. Some other things I've noticed: * Just before the content charset is set, the parameters of the POST request are populated. If the value for a parameter is null, the code attempts to to add a null parameter. This however will cause an IllegalArgumentException from httpclient (cf. [http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/methods/PostMethod.html#addParameter(java.lang.String, java.lang.String)]). * TestDistributedSearch does not exercise the code to refine facet counts. Adding another facet request with facet.limit=1 redresses this. > Distributed Search over HTTP > ---------------------------- > > Key: SOLR-303 > URL: https://issues.apache.org/jira/browse/SOLR-303 > Project: Solr > Issue Type: New Feature > Components: search > Reporter: Sharad Agarwal > Assignee: Yonik Seeley > Fix For: 1.3 > > Attachments: distributed.patch, distributed.patch, distributed.patch, > distributed.patch, distributed.patch, distributed.patch, distributed.patch, > distributed.patch, distributed.patch, distributed.patch, distributed.patch, > distributed.patch, distributed_add_tests_for_intended_behavior.patch, > distributed_facet_count_bugfix.patch, distributed_pjaol.patch, > fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, > fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, > fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch > > > Searching over multiple shards and aggregating results. > Motivated by http://wiki.apache.org/solr/DistributedSearch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.