[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717110#action_12717110
 ] 

Earwin Burrfoot commented on SOLR-236:
--------------------------------------

I have implemented collapsing on a high-volume project of mine in much less 
flexible, but more practical manner.

Part I. You have to guarantee that all documents having the same value of 
collapse-field are dropped into Lucene index as a sequential batch. That 
guarantees they get sequential docIds, and with some more work - that they all 
end up in the same segment.
Part II. When doing collection you always get docIds in sequential order, and 
thus, thanks to Part I you get the docs-to-be-collapsed already grouped by 
collapse-field, even before you drop the docs into PriorityQueue to sort them.

Cons:
You can only collapse on a single predetermined at index creation time field.
If one document changes, you have to reindex all docs that have the same 
collapse-field value, so it's best if you have either low update/add rates, or 
few documents sharing the same collapse-field value.

Pros:
The CPU and memory costs for collapsing compared to usual search are very close 
to zero and do not depend on index size/total docs found.
The same idea works with new Lucene per-segment collection and in distributed 
mode (sharded index).
Within collapsed group you can sort hits however you want, and select one that 
will represent the group for usual sort/paging.
The implementation is not brain-dead simple, but nears it.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, 
> field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
> field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to