> Date: Mon, 5 Nov 2007 14:55:21 -0500> From: [EMAIL PROTECTED]> To: 
> solr-user@lucene.apache.org> Subject: Re: Phrase Query Performance Question 
> and score threshold> > On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote:> > 
> If I limit the documents returned based on a score threshold (filter by 
> score) will it be able to improve query performance?> > No.> > Taking a 
> different approach can really speed up queries though.> To figure out what 
> approach you should take, we need to know what you> are trying to do.> As 
> Hoss said: http://people.apache.org/~hossman/#xyproblem> > > How many 
> different phrase queries are you having performance issues with?> > -Yonik
 
 
 
Thanks for replying Yonik.  
 
Out of my strong curiosity I was trying to implement a search application that 
my colleague already did very successfully. I tried to to use SOLR to build the 
same application and see if it works. Basically there are millions of 
documents. They are categorized and the content of the document is constructed 
by program using its category as input. A search application will search the 
content and bring up the document. The way of constructing the document has 
been proven to be excellent in terms of relevancy. Of course it rely on using 
slop phrase queries.  Now I want to build something that is able to search the 
content and bring up the document fast.  That is basically what I want to do. 
 
I can't go any more detail on how the document content was constructed because 
the company I work for has patent pending on it. I dare not to discuss it in 
public. But the way it was constructed seems to be the reason of why document 
frequency was so high (for many phrase) and a search usually bring up large 
result set.  But top score documents have very good relevancy. So I am facing 
two issue. One is to make the slop phrase query faster, second is to make 
result set smaller. 
 
Using a score threshold may solve the second issue. That will be great if you 
can point me how to achieve that. 
 
As for the first issues. The number of different phrase queries have 
performance issues I found so far are about 10. I believe there will be a lot 
more I just haven't tried.  It can be solve by using faster hard ware though.  
Also I believe it will help if SOLR has samilar distributed search architecture 
like NUTCH so that it can scale out instead of scale up. 
 
 
 
 
Thanks a lot
Haishan
_________________________________________________________________
Help yourself to FREE treats served up daily at the Messenger Café. Stop by 
today.
http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline

Reply via email to