Not sure if it will work. Say we have SearchComponent which does this in process method:
1. DocList docs = rb.getResults().docList; 2. Go over docs and for each doc do: 3. BooleanQuery q = new BooleanQuery(); //construct a query which gets all docs which are not equal to current one and are from a different host (we deal there with web pages) q.add(new TermQuery(new Term("host", host)), BooleanClause.Occur.MUST_NOT); q.add(new TermQuery(new Term("id", name)), BooleanClause.Occur.MUST_NOT); DocListAndSet sim = searcher.getDocListAndSet( q, (TermQuery) null, null, 0, 1000); //TODO how to set proper limit not hard-coded 1000 4. for all docs in sim calculate similarity to current doc (from #2) 5. Count all similar documents and add a new field FieldType ft = new FieldType(); ft.setStored(true); ft.setIndexed(true); Field f = new IntField("similarCount", ds.size(), ft); d.add(f); Now the problem is with #1 this comes in already sorted. That is if I call solr with q=*&sort=similarityCount, sort is applied before calling last component, which does all the above defined steps. If I add this to first-components then #1 call will return null. Completely different approach would be to calculate aggregate values on update via UpdateRequestProcessor. But then I need to be able to do searches in update processor (step #3). But in that case docs for searcher are available only after commit. I'd expect that this would work but search always returns 0 public void processCommit(CommitUpdateCommand cmd) throws IOException { TopDocs docs = searcher.search(new MatchAllDocsQuery(), 100); DocListAndSet sim = searcher.getDocListAndSet( new MatchAllDocsQuery(), (TermQuery) null, null, 0, 10); DocList docs = sim.docList; <---- Is always empty (Tried placing it after solr.RunUpdateProcessorFactory in update chain, no change) Even if searcher would work, it looks bad. Because in this case I would need to update not only incoming document but also all those documents which are similar to a current one (That is if A is similar to B and C, then B and C are similar to A, and similarCount field has to be increased in B and C as well). ________________________________ From: Koji Sekiguchi <k...@r.email.ne.jp> To: solr-user@lucene.apache.org Sent: Thursday, July 18, 2013 4:29 PM Subject: Re: Sort by document similarity counts > I have tried doing this via custom SearchComponent, where I can find all > similar documents for each document in current search result, then add a new > field into document hoping to use sort parameter (q=*&sort=similarityCount). I don't understand this part very well, but: > But this will not work because sort is done before handling my custom search > component, if added via last-components. Can't add it via first-components, > because then I will have no access to query results. And I do not want to > override QueryComponent because I need to have all the functionality it > covers: grouping, facets, etc. You may want to put your custom SearchComponent to last-component and inject SortSpec in your prepare() so that QueryComponent can sort the result complying with your SortSpec? koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html