[jira] Updated: (SOLR-236) Field collapsing

Martijn van Groningen (JIRA) Sat, 26 Sep 2009 09:57:02 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Martijn van Groningen updated SOLR-236:
---------------------------------------

    Attachment: field-collapse-5.patch

I have created a new patch that has the following changes:
1) Non adajacent collasping with sorting on score also uses the Solr caches 
now. So now every field collapse searches are using the Solr caches properly. 
This was not the case in my previous versions of the patch. This improvement 
will make field collapsing perform better and reduce the query time for regular 
searches. The downside was, that in order to make this work I had to modify 
some methods in the SolrIndexSearcher. 

When sorting on score the non adjacent collapsing algorithm needs the score per 
document. The score is collected in a Lucene collector. The previous version of 
the patch uses the searcher.search(Query, Filter, Collector) method to collect 
the documents (as a DocSet) and scores, but by using this method the Solr 
caches were ignored.

The methods that return a DocSet in the SolrIndexSearcher do not offer the 
ability the specify your own collector. I changed that so you can specify your 
own collector and still benefit from the Solr caches. I did this in a non 
intrusive manner, so that nothing changes for existing code that uses the 
normal versions of these methods. 
{code}

   public DocSet getDocSet(Query query) throws IOException {
    DocSetCollector collector = new DocSetCollector(maxDoc()>>6, maxDoc());
    return getDocSet(query, collector);
   }

   public DocSet getDocSet(Query query, DocSetAwareCollector collector) throws 
IOException {
    ....
   }

  DocSet getPositiveDocSet(Query q) throws IOException {
    DocSetCollector collector = new DocSetCollector(maxDoc()>>6, maxDoc());
    return getPositiveDocSet(q, collector);
   }

  DocSet getPositiveDocSet(Query q, DocSetAwareCollector collector) throws 
IOException {
    .....
   }

  public DocSet getDocSet(List<Query> queries) throws IOException {
    DocSetCollector collector = new DocSetCollector(maxDoc()>>6, maxDoc());
    return getDocSet(queries, collector);
   }

  public DocSet getDocSet(List<Query> queries, DocSetAwareCollector collector) 
throws IOException {
   .......
   }

  protected DocSet getDocSetNC(Query query, DocSet filter) throws IOException {
    DocSetCollector collector = new DocSetCollector(maxDoc()>>6, maxDoc());
    return getDocSetNC(query,  filter, collector);
   }

  protected DocSet getDocSetNC(Query query, DocSet filter, DocSetAwareCollector 
collector) throws IOException {
   .........
   }
{code}
I also made a DocSetAwareCollector that both DocSetCollector and 
DocSetScoreCollector implement.
2) The collapse.includeCollapsedDocs parameters has been removed. In order to 
include the collapsed documents the parameter collapse.includeCollapsedDocs.fl 
must be specified. collapse.includeCollapsedDocs.fl=* will include all fields 
of the collapsed documents and collapse.includeCollapsedDocs.fl=id,name will 
only include the id and name field of the collapsed documents.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
> field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
> SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-236) Field collapsing

Reply via email to