[
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605868#action_12605868
]
Lars Kotthoff commented on SOLR-303:
------------------------------------
Yonik, thanks for taking a look at it.
I've investigated this issue further and I believe I know what the root cause
is now. The line
{code:title=o.a.s.client.solrj.impl.CommonsHttpSolrServer.java}
...
post.getParams().setContentCharset("UTF-8");
...
{code}
tells the *sender* to encode the data as UTF-8. The way the *receiver* decodes
the data depends on whatever is set as charset in the Content-Type header. This
header is currently automatically added by httpclient and, as you can see in
the netcat log, "application/x-www-form-urlencoded", i.e. without a charset.
The default charset is ISO-8859-1 (cf.
[http://hc.apache.org/httpclient-3.x/charencodings.html]). So the data is
*encoded* as UTF-8 but *decoded* as ISO-8859-1, which causes the effect I
described earlier.
I tried to reproduce this with TestDistributedSearch myself, but for some
reason it seems to be fine. Perhaps the Jetty configuration is different to my
Tomcat configuration. I didn't find any parameter to tell Tomcat the default
encoding if the Content-Type header doesn't specify one though.
The minimal change I had to make to make it work was add a line to set the
Content-Type header explicitly, i.e.
{code:title=o.a.s.client.solrj.impl.CommonsHttpSolrServer.java}
...
post.getParams().setContentCharset("UTF-8");
post.setRequestHeader("Content-Type", "application/x-www-form-urlencoded;
charset=UTF-8");
...
{code}
This probably won't work with multi-part requests though. I'm not sure what the
right way to handle this would be. The stub Content-Type header is set by
httpclient when the method is executed, i.e. there's no way to let httpclient
figure out the first part and then append the charset in CommonsHttpSolrServer.
Some other things I've noticed:
* Just before the content charset is set, the parameters of the POST request
are populated. If the value for a parameter is null, the code attempts to to
add a null parameter. This however will cause an IllegalArgumentException from
httpclient (cf.
[http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/methods/PostMethod.html#addParameter(java.lang.String,
java.lang.String)]).
* TestDistributedSearch does not exercise the code to refine facet counts.
Adding another facet request with facet.limit=1 redresses this.
> Distributed Search over HTTP
> ----------------------------
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
> Issue Type: New Feature
> Components: search
> Reporter: Sharad Agarwal
> Assignee: Yonik Seeley
> Fix For: 1.3
>
> Attachments: distributed.patch, distributed.patch, distributed.patch,
> distributed.patch, distributed.patch, distributed.patch, distributed.patch,
> distributed.patch, distributed.patch, distributed.patch, distributed.patch,
> distributed.patch, distributed_add_tests_for_intended_behavior.patch,
> distributed_facet_count_bugfix.patch, distributed_pjaol.patch,
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch,
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch,
> fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.