[jira] Commented: (SOLR-303) Distributed Search over HTTP

Yonik Seeley (JIRA) Wed, 09 Jan 2008 19:46:56 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557535#action_12557535
 ]


Yonik Seeley commented on SOLR-303:
-----------------------------------

> one solution i've seen to mitigate problems like this in the past is to 
> compute a higher "limit" when querying the individual shards

Yep.  Eventually should be configurable too.  We should definitely do some 
"over requesting" for very small limits.  Expanding the limit too much can be 
expensive though (CPU cost partially depends on the algorithm).  I think users 
should even be able to disable refinement queries if they just want an estimate.

Note that it's possible to tell if there even could be stealth terms out 
there... we maintain the smallest count we get from each shard, so that serves 
as the largest count any unknown term could have.  Add all those together to 
see if it's possible an unknown term could make it to the top terms.   This 
means you could do a request with a smaller limit, and then re-request with a 
larger limit if necessary.

Beyond that, it becomes unclear what the best strategy is.  Worst case 
scenario: If the top N facets get down to a count of 1, then *any* unknown term 
could bump another higher.  Requesting all terms with count>=1 from each shard 
isn't something I want to ponder. 

Anyway, a colleague informs me that this is the way at least one other major 
search vendor does things (counts are exact for terms shown, but it is 
theoretically possible to miss a term). 


> Distributed Search over HTTP
> ----------------------------
>
>                 Key: SOLR-303
>                 URL: https://issues.apache.org/jira/browse/SOLR-303
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Sharad Agarwal
>            Assignee: Yonik Seeley
>         Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.stu.patch, fedsearch.stu.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-303) Distributed Search over HTTP

Reply via email to