[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605868#action_12605868
 ] 

Lars Kotthoff commented on SOLR-303:
------------------------------------

Yonik, thanks for taking a look at it.

I've investigated this issue further and I believe I know what the root cause 
is now. The line
{code:title=o.a.s.client.solrj.impl.CommonsHttpSolrServer.java}
...
post.getParams().setContentCharset("UTF-8");
...
{code}
tells the *sender* to encode the data as UTF-8. The way the *receiver* decodes 
the data depends on whatever is set as charset in the Content-Type header. This 
header is currently automatically added by httpclient and, as you can see in 
the netcat log, "application/x-www-form-urlencoded", i.e. without a charset. 
The default charset is ISO-8859-1 (cf. 
[http://hc.apache.org/httpclient-3.x/charencodings.html]). So the data is 
*encoded* as UTF-8 but *decoded* as ISO-8859-1, which causes the effect I 
described earlier.

I tried to reproduce this with TestDistributedSearch myself, but for some 
reason it seems to be fine. Perhaps the Jetty configuration is different to my 
Tomcat configuration. I didn't find any parameter to tell Tomcat the default 
encoding if the Content-Type header doesn't specify one though.

The minimal change I had to make to make it work was add a line to set the 
Content-Type header explicitly, i.e.
{code:title=o.a.s.client.solrj.impl.CommonsHttpSolrServer.java}
...
post.getParams().setContentCharset("UTF-8");
post.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; 
charset=UTF-8");
...
{code}
This probably won't work with multi-part requests though. I'm not sure what the 
right way to handle this would be. The stub Content-Type header is set by 
httpclient when the method is executed, i.e. there's no way to let httpclient 
figure out the first part and then append the charset in CommonsHttpSolrServer.

Some other things I've noticed:
* Just before the content charset is set, the parameters of the POST request 
are populated. If the value for a parameter is null, the code attempts to to 
add a null parameter. This however will cause an IllegalArgumentException from 
httpclient (cf. 
[http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/methods/PostMethod.html#addParameter(java.lang.String,
 java.lang.String)]).
* TestDistributedSearch does not exercise the code to refine facet counts. 
Adding another facet request with facet.limit=1 redresses this.

> Distributed Search over HTTP
> ----------------------------
>
>                 Key: SOLR-303
>                 URL: https://issues.apache.org/jira/browse/SOLR-303
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Sharad Agarwal
>            Assignee: Yonik Seeley
>             Fix For: 1.3
>
>         Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
> distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
> fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to