[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: field-collapse-5.patch

Hi Thomas, I  have fixed the problem and updated the patch. I was able to 
reproduce the bug on the Solr example dataset. The problem was not limited to 
field collapsing with sorting on a field alone. The problem was located in the 
NonAdjactentFieldCollapser in the doCollapse(...) method in this specific part:
{code}
      // dropoutId has a value smaller than the smallest value in the queue and 
therefore it was removed from the queue
      collapseDoc.priorityQueue.insertWithOverflow(currentId);

      // check if we have reached the collapse threshold, if so start counting 
collapsed documents
      if (++collapseDoc.totalCount > collapseTreshold) {
        collapseDoc.collapsedDocuments++;
        if (dropOutId != null) {
          addCollapsedDoc(currentId, currentValue);
        }
      }
{code}
Lets say that that the currentId has the most relevent field value and the 
collapseThreshold is met. When the currentId is added to the queue it stays 
there and another document id will be dropped out. In this situation a document 
that is the most relevant field value is added to the collapsed documents and 
it stays in the queue and therefore it will also be added to the normal 
results. 

I changed it to this.
{code}
      // dropoutId has a value smaller than the smallest value in the queue and 
therefore it was removed from the queue
      Integer dropOutId = (Integer) 
collapseDoc.priorityQueue.insertWithOverflow(currentId);

      // check if we have reached the collapse threshold, if so start counting 
collapsed documents
      if (++collapseDoc.totalCount > collapseTreshold) {
        collapseDoc.collapsedDocuments++;
        if (dropOutId != null) {
          addCollapsedDoc(dropOutId, currentValue);
        }
      }
{code}
Now only a document that will never and up in the final results is added to the 
collapsed documents (and not the current document that might be more relevant 
then other documents in the priority queue). The above code change fixes the 
bug in my test setups, can you also confirm that this fixes the issue on your 
side?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
> field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-solr-236-2.patch, 
> field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
> field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to