Not sure if it will work. Say we have SearchComponent which does this in 
process method:

1. DocList docs = rb.getResults().docList;

2. Go over docs and for each doc do:

3. 
BooleanQuery q = new BooleanQuery(); //construct a query which gets all docs 
which are not equal to current one and are from a different host (we deal there 
with web pages)
q.add(new TermQuery(new Term("host", host)), BooleanClause.Occur.MUST_NOT);
q.add(new TermQuery(new Term("id", name)), BooleanClause.Occur.MUST_NOT);
DocListAndSet sim = searcher.getDocListAndSet( q, (TermQuery) null, null, 0, 
1000); //TODO how to set proper limit not hard-coded 1000

4. for all docs in sim calculate similarity to current doc (from #2)

5. Count all similar documents and add a new field
            FieldType ft = new FieldType();
            ft.setStored(true);
            ft.setIndexed(true);
            Field f = new IntField("similarCount", ds.size(), ft);
            d.add(f);


Now the problem is with #1 this comes in already sorted. That is if I call solr 
with q=*&sort=similarityCount, sort is applied before calling last component, 
which does all the above defined steps. If I add this to first-components then 
#1 call will return null.


Completely different approach would be to calculate aggregate values on update 
via UpdateRequestProcessor. But then I need to be able to do searches in update 
processor (step #3). But in that case docs for searcher are available only 
after commit. I'd expect that this would work but search always returns 0

public void processCommit(CommitUpdateCommand cmd) throws IOException {
               TopDocs docs = searcher.search(new MatchAllDocsQuery(), 100);
               DocListAndSet sim = searcher.getDocListAndSet( 
                    new MatchAllDocsQuery(), (TermQuery) null, null, 0, 10);
                DocList docs = sim.docList; <---- Is always empty

(Tried placing it after solr.RunUpdateProcessorFactory in update chain, no 
change)

Even if searcher would work, it looks bad. Because in this case I would need to 
update not only incoming document but also all those documents which are 
similar to a current one (That is if A is similar to B and C, then B and C are 
similar to A, and similarCount field has to be increased in B and C as well).



________________________________
 From: Koji Sekiguchi <k...@r.email.ne.jp>
To: solr-user@lucene.apache.org 
Sent: Thursday, July 18, 2013 4:29 PM
Subject: Re: Sort by document similarity counts
 

> I have tried doing this via custom SearchComponent, where I can find all 
> similar documents for each document in current search result, then add a new 
> field into document hoping to use sort parameter (q=*&sort=similarityCount).

I don't understand this part very well, but:

> But this will not work because sort is done before handling my custom search 
> component, if added via last-components. Can't add it via first-components, 
> because then I will have no access to query results. And I do not want to 
> override QueryComponent because I need to have all the functionality it 
> covers: grouping, facets, etc.

You may want to put your custom SearchComponent to last-component and inject 
SortSpec
in your prepare() so that QueryComponent can sort the result complying with 
your SortSpec?

koji
-- 
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html

Reply via email to