[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557531#action_12557531
 ] 

Hoss Man commented on SOLR-303:
-------------------------------

bq. OK, this version patches cleanly and includes some distributed faceting 
code.

I haven't looked at it ... but holy freaking cow that's cool.

bq. Note that it is theoretically possible to miss terms. A term could be just 
below the threshold of each shard (and thus not returned by any shard), but the 
total count could boost it in the top. This could be rectified by retrieving 
all terms above a specified count, but it could be expensive. The counts that 
are currently returned are exact.

one solution i've seen to mitigate problems like this in the past is to compute 
a higher "limit" when querying the individual shards, someone somewhere 
suggested that n**2 is a good approach (but they may have been talking out of 
their ass) so if the initial request says facet.limit=5, the individual shards 
would be queried with facet.limit=25 ... but you'd also still want to use 
refinement requests.



> Distributed Search over HTTP
> ----------------------------
>
>                 Key: SOLR-303
>                 URL: https://issues.apache.org/jira/browse/SOLR-303
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Sharad Agarwal
>            Assignee: Yonik Seeley
>         Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.stu.patch, fedsearch.stu.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to