[ https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557535#action_12557535 ]
Yonik Seeley commented on SOLR-303: ----------------------------------- > one solution i've seen to mitigate problems like this in the past is to > compute a higher "limit" when querying the individual shards Yep. Eventually should be configurable too. We should definitely do some "over requesting" for very small limits. Expanding the limit too much can be expensive though (CPU cost partially depends on the algorithm). I think users should even be able to disable refinement queries if they just want an estimate. Note that it's possible to tell if there even could be stealth terms out there... we maintain the smallest count we get from each shard, so that serves as the largest count any unknown term could have. Add all those together to see if it's possible an unknown term could make it to the top terms. This means you could do a request with a smaller limit, and then re-request with a larger limit if necessary. Beyond that, it becomes unclear what the best strategy is. Worst case scenario: If the top N facets get down to a count of 1, then *any* unknown term could bump another higher. Requesting all terms with count>=1 from each shard isn't something I want to ponder. Anyway, a colleague informs me that this is the way at least one other major search vendor does things (counts are exact for terms shown, but it is theoretically possible to miss a term). > Distributed Search over HTTP > ---------------------------- > > Key: SOLR-303 > URL: https://issues.apache.org/jira/browse/SOLR-303 > Project: Solr > Issue Type: New Feature > Components: search > Reporter: Sharad Agarwal > Assignee: Yonik Seeley > Attachments: distributed.patch, distributed.patch, distributed.patch, > distributed.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, > fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, > fedsearch.stu.patch, fedsearch.stu.patch > > > Searching over multiple shards and aggregating results. > Motivated by http://wiki.apache.org/solr/DistributedSearch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.