Re: [jira] Commented: (SOLR-1311) pseudo-field-collapsing

Marc Sturlese Mon, 14 Sep 2009 02:09:12 -0700

Well, the thing is my patch is very good in performance because by now it can
not be integrated as a plugin. Field collaping patch does 2 "searches" one
to pick the ids to collapse and the second to filter the ids in the main
search.
What I do is to pseudo-collapse straight in the mian search... reordering
the ids in the getDocListAndSetNC and getDocListNC so response times are
almost the same with or without the patch.


JIRA j...@apache.org wrote:
> 
> 
>     [
> https://issues.apache.org/jira/browse/SOLR-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754509#action_12754509
> ] 
> 
> Uri Boness commented on SOLR-1311:
> ----------------------------------
> 
> Wouldn't be an idea to try and merge this code with the original field
> collapsing patch? Quite a bit of work was done recently on that patch to
> make it more extensible. So for example, you now have a _Collapser_
> interface that encapsulates the actual collapsing algorithm, and my guess
> is that your algorithm can probably fit there. Indeed when the corpus is
> large, adjacent field collapsing can turn into a performance issue, and
> having this pseudo algorithm seems to make a lot of sense. So for example,
> using the original field collapsing patch, it would be nice if we could
> just define another parameter called collapse.type which will hold one of
> three values: adjacent, pseudo-adjacent, and non-adjacent.
> 
> BTW, I haven't looked at your patch yet and I don't know how well it works
> with faceting? But integrating it with the original patch will enable you
> that support (i.e. before/after collapse facet counts support)
> automatically.
> 
>> pseudo-field-collapsing
>> -----------------------
>>
>>                 Key: SOLR-1311
>>                 URL: https://issues.apache.org/jira/browse/SOLR-1311
>>             Project: Solr
>>          Issue Type: New Feature
>>          Components: search
>>    Affects Versions: 1.4
>>            Reporter: Marc Sturlese
>>             Fix For: 1.5
>>
>>         Attachments: SOLR-1311-pseudo-field-collapsing.patch
>>
>>
>> I am trying to develope a new way of doing field collapsing based on the
>> adjacent field collapsing algorithm. I have started developing it beacuse
>> I am experiencing performance problems with the field collapsing patch
>> with big index (8G).
>> The algorith does adjacent-pseudo-field collapsing. It does collapsing on
>> the first X documents. Instead of making the collapsed docs disapear, the
>> algorith will send them to a given position of the relevance results
>> list.
>> The reason I just do collapsing in the first X documents is that if I
>> have for example 600000 results and I am showing 10 results per page, I
>> really don't need to do collapsing in the page 30000 or even not in the
>> 3000. Doing this I am noticing dramatically better performance. The
>> problem is I couldn't find a way to plug the algorithm as a component and
>> keep good performance. I had to hack few classes in
>> SolrIndexSearcher.java
>> This patch is just experimental and for testing purposes. In case someone
>> finds it interesting would be good do find a way to integrate it in a
>> better way than it is at the moment.
>> Advices are more than welcome.
>>      
>> Functionality:
>> In solrconfig.xml we specify the pseudo-collapsing parameters:
>>      <str name="plus.considerMoreDocs">true</str>
>>      <str name="plus.considerHowMany">3000</str>
>>      <str name="plus.considerField">name</str>
>> (at the moment there's no threshold and other parameters that exist in
>> the current collapse-field patch)
>> plus.considerMoreDocs one enables pseudo-collapsing
>> plus.considerHowMany sets the number of resultant documents in wich we
>> want to apply the algorithm
>> plus.considerField is the field to do pseudo-collapsing
>> If the number of results is lower than plus.considerHowMany the algorithm
>> will be applyed to all the results.
>> Let's say there is a query with 600000 results and we've set
>> considerHowMany to 3000 (and we already have the docs sorted by
>> relevance). 
>> What adjacent-pseudo-collapse does is, if the 2nd doc has to be collapsed
>> it will be sent to the pos 2999 of the relevance results array. If the
>> 3th has to be collpased too  will go to the position 2998 and
>> successively like this.
>> The algorithm is not applyed when a sortspec is set or
>> plus.considerMoreDocs is set to false. It neighter is applyed when using
>> MoreLikeThisRequestHanlder.
>> Example with a query of 9 results:
>> Results sorted by relevance without pseudo-collapse-algorithm:
>> doc1 - collapse_field_value 3
>> doc2 - collapse_field_value 3
>> doc3 - collapse_field_value 4
>> doc4 - collapse_field_value 7
>> doc5 - collapse_field_value 6
>> doc6 - collapse_field_value 6
>> doc7 - collapse_field_value 5
>> doc8 - collapse_field_value 1
>> doc9 - collapse_field_value 2
>> Results pseudo-collapsed with plus.considerHowMany = 5
>> doc1 - collapse_field_value 3
>> doc3 - collapse_field_value 4
>> doc4 - collapse_field_value 7
>> doc5 - collapse_field_value 6
>> doc2 - collapse_field_value 3*
>> doc6 - collapse_field_value 6
>> doc7 - collapse_field_value 5
>> doc8 - collapse_field_value 1
>> doc9 - collapse_field_value 2
>> Results pseudo-collapsed with plus.considerHowMany = 9
>> doc1 - collapse_field_value 3
>> doc3 - collapse_field_value 4
>> doc4 - collapse_field_value 7
>> doc5 - collapse_field_value 6
>> doc7 - collapse_field_value 5
>> doc8 - collapse_field_value 1
>> doc9 - collapse_field_value 2
>> doc6 - collapse_field_value 6*
>> doc2 - collapse_field_value 3*
>> *pseudo-collapsed documents
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/-jira--Created%3A-%28SOLR-1311%29-pseudo-field-collapsing-tp24684526p25432571.html
Sent from the Solr - Dev mailing list archive at Nabble.com.

Re: [jira] Commented: (SOLR-1311) pseudo-field-collapsing

Reply via email to