[jira] Commented: (SOLR-236) Field collapsing

Charles Hornberger (JIRA) Mon, 04 Feb 2008 11:02:16 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565468#action_12565468
 ]


Charles Hornberger commented on SOLR-236:
-----------------------------------------

bq. However, instead of requiring each and every DocSet subclass to know about 
all other ones (and in the absence of language support for multiple dispatch), 
I think it would be better to centralize this knowledge in a single class 
DocSetOp with static methods that selects the appropriate implementation for an 
operation based on the type of _both_ parameters.

+1 for this ... whether or not NegatedDocSet is part of the final 
implementation of this feature. FWIW, I just noticed that there's another bug 
lurking in BitDocSet.andNot(), which will fail if a NegatedDocSet is passed in. 
It seems to me that it might be easier -- at least for me -- to 
read/write/extend a test suite that exercised all the paths thru DocSetOp, than 
to write a set of tests that exercised all the paths thru DocSetBase and its 
subclasses.

Also, I think that maybe there's a clear distinction to be made between 
intrinsic operations on a set (add(), exists(), et al.) and ones that involve 
another set (intersection(), union(), andNot()). Not sure it's a useful one, 
but it make sense to me. I don't know, though, whether it make sense to go 
further than that and say -- as the current implementation of NegatedDocSet 
implies -- that there are some set operations (iterator() and size()) that are 
in fact optional.

Off the top of my head: Would it be simpler to just modify add a {{filterType}} 
flag to the getDocList*() family of methods in SolrSearchInterface to cause it 
to call {{a.andNot(b)}} rather than {{a.intersection(b)}} when applying {{b}} 
as a filter? (I'm really completely ignorant -- or nearly completely -- of how 
the seach code works, so feel free not to dignify this with a response if it's 
a useless idea ... :-))

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, 
> field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-236) Field collapsing

Reply via email to