[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771155#action_12771155 ]
Martijn van Groningen commented on SOLR-236: -------------------------------------------- It certainly has be going on for a long time :-) Talking about the last miles there are a few things in my mind about field collapsing: * Change the response format. Currently if I look at the response even I get confused sometimes about the information returned. The response should more structured. Something like this: {code:xml} <lst name="collapse_counts"> <str name="field">venue</str> <lst name="results"> <lst name="233238"> <!-- id of most relevant document of the group --> <str name="fieldValue">melkweg</str> <int name="collapseCount">2</int> <!-- and other CollapseCollector specific collapse information --> </lst> ... </lst> </lst> {code} Currently when doing adjacent field collapsing the _collapse_counts_ gives results that are unusable to use. The _collapse_counts_ use the field value as key which is not unique for adjacent collapsing as shown in the example: {code:xml} <lst name="collapse_counts"> <int name="hard">1</int> <int name="hard">1</int> <int name="electronics">1</int> <int name="memory">2</int> <int name="monitor">1</int> </lst> {code} * Add the notion of a CollapseMatcher, that decides whether document field values are equal or not and thus whether they are allowed to be collapsed. This opens the road for more exotic features like fuzzy field collapsing and collapsing on more than one field. Also this allows users of the patch to easily implement their own matching rules. * Distributed field collapsing. Although I have some ideas on how to get started, from my perspective it not going to be performed. Because somehow the field collapse state has to be shared between shards in order to do proper field collapsing. This state can potentially be a lot of data depending on the specific search and corpus. * And maybe add a collapse collector that collects statistics about most common field value per collapsed group. I think that this is somewhat the roadmap from my side for field collapsing at moment, but feel free to elaborate on this. Btw I have recently written a [blog|http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/] about field collapsing in general, that might be handy for someone who is implementing field collapsing. > Field collapsing > ---------------- > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search > Affects Versions: 1.3 > Reporter: Emmanuel Keller > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, > field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, > field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, > field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, > SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.