[jira] Commented: (SOLR-236) Field collapsing

Karsten Sperling (JIRA) Sun, 03 Feb 2008 01:32:40 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565158#action_12565158
 ]


Karsten Sperling commented on SOLR-236:
---------------------------------------

NegatedDocSet got introduced because the filter logic expects to use the 
intersection operation to apply a number of filters to a result. Introducing a 
negated docset was much easier than supporting both intersection as well as 
and-not type filters.

NegatedDocSet does not support iteration because the negation of a finite set 
is (at least theoretically) infinite. Even though it would in practice be 
possible to limit the negated set via the known maximum document id, this would 
probably not be very efficient. However, it is simply not necessary to ever 
iterate over the elements of a NegatedDocSet, because we know that the 
end-result of all DocSet operations is going to be a finite set of results, not 
an infinite one. A NegatedDocSet will only ever be used to "subtract" from a 
finite DocSet. As Yonik has pointed out, operations on a NegatedDocSet can be 
rewritten as (different) operations on the set being negated. The operation 
methods inside NegatedDocSet do this.

The reason the bug occurs is because of the naive way the binary set operation 
calls are dispatched: DocSet clients simply call e.g. set1.intersection(set2), 
arbitrarily leaving the choice of implementation to the logic defined by the 
class of set1. Currently, BitDocSet does not know about NegatedDocSet, and 
hence doesn't perform the necessary rewriting or delegation to NegatedDocSet.

However, instead of requiring each and every DocSet subclass to know about all 
other ones (and in the absence of language support for multiple dispatch), I 
think it would be better to centralize this knowledge in a single class 
DocSetOp with static methods that selects the appropriate implementation for an 
operation based on the type of _both_ parameters. Either the client code could 
be changed to call DocSetOp.intersection(a, b) instead of a.intersection(b), 
but this would involve changing the DocSet interface. A backwards compatible 
solution would be to simply have final DocSetBase.intersection() delegating to 
DocSetOp.intersection.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, 
> field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-236) Field collapsing

Reply via email to