Well, the thing is my patch is very good in performance because by now it can not be integrated as a plugin. Field collaping patch does 2 "searches" one to pick the ids to collapse and the second to filter the ids in the main search. What I do is to pseudo-collapse straight in the mian search... reordering the ids in the getDocListAndSetNC and getDocListNC so response times are almost the same with or without the patch.
JIRA j...@apache.org wrote: > > > [ > https://issues.apache.org/jira/browse/SOLR-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754509#action_12754509 > ] > > Uri Boness commented on SOLR-1311: > ---------------------------------- > > Wouldn't be an idea to try and merge this code with the original field > collapsing patch? Quite a bit of work was done recently on that patch to > make it more extensible. So for example, you now have a _Collapser_ > interface that encapsulates the actual collapsing algorithm, and my guess > is that your algorithm can probably fit there. Indeed when the corpus is > large, adjacent field collapsing can turn into a performance issue, and > having this pseudo algorithm seems to make a lot of sense. So for example, > using the original field collapsing patch, it would be nice if we could > just define another parameter called collapse.type which will hold one of > three values: adjacent, pseudo-adjacent, and non-adjacent. > > BTW, I haven't looked at your patch yet and I don't know how well it works > with faceting? But integrating it with the original patch will enable you > that support (i.e. before/after collapse facet counts support) > automatically. > >> pseudo-field-collapsing >> ----------------------- >> >> Key: SOLR-1311 >> URL: https://issues.apache.org/jira/browse/SOLR-1311 >> Project: Solr >> Issue Type: New Feature >> Components: search >> Affects Versions: 1.4 >> Reporter: Marc Sturlese >> Fix For: 1.5 >> >> Attachments: SOLR-1311-pseudo-field-collapsing.patch >> >> >> I am trying to develope a new way of doing field collapsing based on the >> adjacent field collapsing algorithm. I have started developing it beacuse >> I am experiencing performance problems with the field collapsing patch >> with big index (8G). >> The algorith does adjacent-pseudo-field collapsing. It does collapsing on >> the first X documents. Instead of making the collapsed docs disapear, the >> algorith will send them to a given position of the relevance results >> list. >> The reason I just do collapsing in the first X documents is that if I >> have for example 600000 results and I am showing 10 results per page, I >> really don't need to do collapsing in the page 30000 or even not in the >> 3000. Doing this I am noticing dramatically better performance. The >> problem is I couldn't find a way to plug the algorithm as a component and >> keep good performance. I had to hack few classes in >> SolrIndexSearcher.java >> This patch is just experimental and for testing purposes. In case someone >> finds it interesting would be good do find a way to integrate it in a >> better way than it is at the moment. >> Advices are more than welcome. >> >> Functionality: >> In solrconfig.xml we specify the pseudo-collapsing parameters: >> <str name="plus.considerMoreDocs">true</str> >> <str name="plus.considerHowMany">3000</str> >> <str name="plus.considerField">name</str> >> (at the moment there's no threshold and other parameters that exist in >> the current collapse-field patch) >> plus.considerMoreDocs one enables pseudo-collapsing >> plus.considerHowMany sets the number of resultant documents in wich we >> want to apply the algorithm >> plus.considerField is the field to do pseudo-collapsing >> If the number of results is lower than plus.considerHowMany the algorithm >> will be applyed to all the results. >> Let's say there is a query with 600000 results and we've set >> considerHowMany to 3000 (and we already have the docs sorted by >> relevance). >> What adjacent-pseudo-collapse does is, if the 2nd doc has to be collapsed >> it will be sent to the pos 2999 of the relevance results array. If the >> 3th has to be collpased too will go to the position 2998 and >> successively like this. >> The algorithm is not applyed when a sortspec is set or >> plus.considerMoreDocs is set to false. It neighter is applyed when using >> MoreLikeThisRequestHanlder. >> Example with a query of 9 results: >> Results sorted by relevance without pseudo-collapse-algorithm: >> doc1 - collapse_field_value 3 >> doc2 - collapse_field_value 3 >> doc3 - collapse_field_value 4 >> doc4 - collapse_field_value 7 >> doc5 - collapse_field_value 6 >> doc6 - collapse_field_value 6 >> doc7 - collapse_field_value 5 >> doc8 - collapse_field_value 1 >> doc9 - collapse_field_value 2 >> Results pseudo-collapsed with plus.considerHowMany = 5 >> doc1 - collapse_field_value 3 >> doc3 - collapse_field_value 4 >> doc4 - collapse_field_value 7 >> doc5 - collapse_field_value 6 >> doc2 - collapse_field_value 3* >> doc6 - collapse_field_value 6 >> doc7 - collapse_field_value 5 >> doc8 - collapse_field_value 1 >> doc9 - collapse_field_value 2 >> Results pseudo-collapsed with plus.considerHowMany = 9 >> doc1 - collapse_field_value 3 >> doc3 - collapse_field_value 4 >> doc4 - collapse_field_value 7 >> doc5 - collapse_field_value 6 >> doc7 - collapse_field_value 5 >> doc8 - collapse_field_value 1 >> doc9 - collapse_field_value 2 >> doc6 - collapse_field_value 6* >> doc2 - collapse_field_value 3* >> *pseudo-collapsed documents > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > -- View this message in context: http://www.nabble.com/-jira--Created%3A-%28SOLR-1311%29-pseudo-field-collapsing-tp24684526p25432571.html Sent from the Solr - Dev mailing list archive at Nabble.com.