We're starting work on adding backup requests <http://static.googleusercontent.com/media/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf> to the ShardHandler. Roughly something like:
1. Send requests to 100 shards. 2. Wait for results from 75 to come back. 3. Wait for either a) the other 25 to come back or b) 20% more time to elapse 4. If any shards have still not returned results, send a second request to a different server for each of the missing shards. I want to be sure I understand the ShardHandler contract correctly before getting started. My understanding is : --ShardHandler#take methods <https://github.com/apache/lucene-solr/blob/dff38c2051ba26f928687139218bbc43e9004ebe/solr/core/src/java/org/apache/solr/handler/component/ShardHandler.java#L25:L26> can be called with different ShardRequests having been submitted <https://github.com/apache/lucene-solr/blob/dff38c2051ba26f928687139218bbc43e9004ebe/solr/core/src/java/org/apache/solr/handler/component/ShardHandler.java#L24> . --ShardHandler#takeXXX is then called in a loop, returning a ShardResponse from the last shard returning for a given ShardRequest. --When ShardHandler#takeXXX returns null, the SearchHandler <https://github.com/apache/lucene-solr/blob/dff38c2051ba26f928687139218bbc43e9004ebe/solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java#L277:L367> proceeds <https://github.com/apache/lucene-solr/blob/dff38c2051ba26f928687139218bbc43e9004ebe/solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java#L333> . For example, the flow could look like: shardHandler.submit(slowGroupingRequest, "shard1", groupingParams); shardHandler.submit(slowGroupingRequest, "shard2", groupingParams); shardHandler.submit(fastFacetRefinementRequest, "shard1", facetParams); shardHandler.submit(fastFacetRefinementRequest, "shard2", facetParams); shardHandler.takeCompletedOrError(); // returns fastFacetRefinementRequest with responses shardHandler.takeCompletedOrError(); // returns slowGroupingRequest with responses shardHandler.takeCompletedOrError(); // return null, SearchHandler exits take loop Does that seem like a correct understanding of the SearchHandler->ShardHandler interaction? If so, it seems that to make backup requests work we'd need to fanout individual ShardRequests independently, each with its own completion service and pending queue. Does that sound right? Thanks! --Gregg