[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647335#action_12647335
 ] 

Iván de Prado commented on SOLR-236:
------------------------------------

I attached a patch named collapsing-patch-to-1.3.0-ivan.patch. The patch 
applies to Solr 1.3.0.

Karsten commented in the comment "Karsten Sperling - 06/Nov/07 02:06 PM":
{quote}
Inverted the logic of the filter DocSet created by CollapseFilter to contain 
the documents that are to be collapsed instead of the ones that are to be kept. 
Without this collapse.maxdocs doesn't work.
{quote}

I found that this way of doing consumes a lot of memory, even if your query is 
bounded to a few number of documents. And I found that there is not advantage 
on using collapse.maxdocs if you don't speed up queries and reduces the amount 
of needed memory. 

So, I decided to revert the Karsten change in order to make field collapsing 
faster and less resources consuming when querying for smaller datasets.

WARNING: This patch changes the semantic of collapse.maxdocs. Before this 
patch, the collapse.maxdocs was used just for reduce the number of docs cheked 
for grouping, but presenting the rest of documents that were not grouped in the 
result. 

With current patch, only documents that were examinated for grouping can appear 
in the result. This semantic have two benefits:
- The amount of resources can be controled per each query
- Not ungrouped content is presented.


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to