[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565158#action_12565158 ]
Karsten Sperling commented on SOLR-236: --------------------------------------- NegatedDocSet got introduced because the filter logic expects to use the intersection operation to apply a number of filters to a result. Introducing a negated docset was much easier than supporting both intersection as well as and-not type filters. NegatedDocSet does not support iteration because the negation of a finite set is (at least theoretically) infinite. Even though it would in practice be possible to limit the negated set via the known maximum document id, this would probably not be very efficient. However, it is simply not necessary to ever iterate over the elements of a NegatedDocSet, because we know that the end-result of all DocSet operations is going to be a finite set of results, not an infinite one. A NegatedDocSet will only ever be used to "subtract" from a finite DocSet. As Yonik has pointed out, operations on a NegatedDocSet can be rewritten as (different) operations on the set being negated. The operation methods inside NegatedDocSet do this. The reason the bug occurs is because of the naive way the binary set operation calls are dispatched: DocSet clients simply call e.g. set1.intersection(set2), arbitrarily leaving the choice of implementation to the logic defined by the class of set1. Currently, BitDocSet does not know about NegatedDocSet, and hence doesn't perform the necessary rewriting or delegation to NegatedDocSet. However, instead of requiring each and every DocSet subclass to know about all other ones (and in the absence of language support for multiple dispatch), I think it would be better to centralize this knowledge in a single class DocSetOp with static methods that selects the appropriate implementation for an operation based on the type of _both_ parameters. Either the client code could be changed to call DocSetOp.intersection(a, b) instead of a.intersection(b), but this would involve changing the DocSet interface. A backwards compatible solution would be to simply have final DocSetBase.intersection() delegating to DocSetOp.intersection. > Field collapsing > ---------------- > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search > Affects Versions: 1.3 > Reporter: Emmanuel Keller > Attachments: field-collapsing-extended-592129.patch, > field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.