[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martijn van Groningen updated SOLR-236: --------------------------------------- Attachment: field-collapse-5.patch Hi Thomas, I have fixed the problem and updated the patch. I was able to reproduce the bug on the Solr example dataset. The problem was not limited to field collapsing with sorting on a field alone. The problem was located in the NonAdjactentFieldCollapser in the doCollapse(...) method in this specific part: {code} // dropoutId has a value smaller than the smallest value in the queue and therefore it was removed from the queue collapseDoc.priorityQueue.insertWithOverflow(currentId); // check if we have reached the collapse threshold, if so start counting collapsed documents if (++collapseDoc.totalCount > collapseTreshold) { collapseDoc.collapsedDocuments++; if (dropOutId != null) { addCollapsedDoc(currentId, currentValue); } } {code} Lets say that that the currentId has the most relevent field value and the collapseThreshold is met. When the currentId is added to the queue it stays there and another document id will be dropped out. In this situation a document that is the most relevant field value is added to the collapsed documents and it stays in the queue and therefore it will also be added to the normal results. I changed it to this. {code} // dropoutId has a value smaller than the smallest value in the queue and therefore it was removed from the queue Integer dropOutId = (Integer) collapseDoc.priorityQueue.insertWithOverflow(currentId); // check if we have reached the collapse threshold, if so start counting collapsed documents if (++collapseDoc.totalCount > collapseTreshold) { collapseDoc.collapsedDocuments++; if (dropOutId != null) { addCollapsedDoc(dropOutId, currentValue); } } {code} Now only a document that will never and up in the final results is added to the collapsed documents (and not the current document that might be more relevant then other documents in the priority queue). The above code change fixes the bug in my test setups, can you also confirm that this fixes the issue on your side? > Field collapsing > ---------------- > > Key: SOLR-236 > URL: https://issues.apache.org/jira/browse/SOLR-236 > Project: Solr > Issue Type: New Feature > Components: search > Affects Versions: 1.3 > Reporter: Emmanuel Keller > Fix For: 1.5 > > Attachments: collapsing-patch-to-1.3.0-dieter.patch, > collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, > collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, > field-collapse-4-with-solrj.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, > field-collapse-5.patch, field-collapse-solr-236-2.patch, > field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, > field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, > field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, > field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, > SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, > solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch > > > This patch include a new feature called "Field collapsing". > "Used in order to collapse a group of results with similar value for a given > field to a single entry in the result set. Site collapsing is a special case > of this, where all results for a given web site is collapsed into one or two > entries in the result set, typically with an associated "more documents from > this site" link. See also Duplicate detection." > http://www.fastsearch.com/glossary.aspx?m=48&amid=299 > The implementation add 3 new query parameters (SolrParams): > "collapse.field" to choose the field used to group results > "collapse.type" normal (default value) or adjacent > "collapse.max" to select how many continuous results are allowed before > collapsing > TODO (in progress): > - More documentation (on source code) > - Test cases > Two patches: > - "field_collapsing.patch" for current development version > - "field_collapsing_1.1.0.patch" for Solr-1.1.0 > P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.