[ 
https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853168#action_12853168
 ] 

Peter Sturge commented on SOLR-1143:
------------------------------------

This is a cool patch - yes, very useful.

I've found a couple of issues with it, though:

1. When going through the 'waiting for shard replies' loop, because no 
exception is thrown on shard failure, the next block after the loop can throw a 
NullPointerException in {{SearchComponent.handleResponses()}} for any 
SearchComponent that checks shard responses. It could be that this doesn't 
always happen, but it certainly happens in FacetComponent when date_facets are 
turned on.

2. There's a bit of code that sets {{partialResults=true}} if there's at least 
one failure, but it doesn't set it to false if everything's ok. In order for 
the patch to operate, this parameter must have already been present and true, 
otherwise the patch is essentially 'disabled' anyway (problem of using the same 
parameter as input and result).

I've made some modifications to the patch for these and a couple of other 
things:

1. FacetComponent modified to check for null shard reponse. Perhaps it would be 
better to check this in SearchHandler.handleResponses(), but then no 
SearchComponents would be contacted re failed shards, even if they don't care 
that it's failed (is that a good thing?).

2. Added a new CommonParams parameter called FAILED_SHARDS.
{{partialResults}} is now only an input parameter to enable the feature (Note: 
{{partialResults}} is referenced in RequestHandlerBase, but it's not from the 
patch - is this an existing parameter that is used for something else?! If so, 
perhaps the name should be changed to something like {{allowPartialResults}} to 
avoid b/w compat and other potential conflicts).
The output parameter that goes in the response header is now: 
{{failedShards=shard0;shard1;shardn}}. If everything succeeds, there will be no 
failedShards in the response header, otherwise, a list of failed shards is 
given. This is very useful to alert someone/something that a server/network 
needs attention (e.g. a health checker thread could run empty disributed 
seaches solely for the purpose of checking status).

3. Changed the detection of a shard request error to be any Exception, rather 
than just ConnectException. This way, any failure is caught and can be 
actioned. Possible TODO: it might be nice to include a short message (Exception 
class name?) in the FAILED_SHARDS parameter about what failed (e.g. 
ConnectException, IOException, etc.). If you like this idea, please say so, and 
I'll include it - i.e. something like: 
{{    
failedShards=myshard:8983/solr/core0|ConnectException;myothershard:8983/solr/core0|IOException}}

I'm currently testing these changes in our internal build. In the meantime, any 
comments are grealy appreciated. If there are no objections, I'll add a patch 
update when the dev test run is complete.




> Return partial results when a connection to a shard is refused
> --------------------------------------------------------------
>
>                 Key: SOLR-1143
>                 URL: https://issues.apache.org/jira/browse/SOLR-1143
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Nicolas Dessaigne
>            Assignee: Grant Ingersoll
>             Fix For: 1.5
>
>         Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a 
> shard (ConnectException), we get partial results from the active shards. As 
> for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we 
> set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year 
> ago 
> (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your 
> thougths about such a behaviour? Should it be the default behaviour for 
> distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to