> Date: Mon, 5 Nov 2007 14:55:21 -0500> From: [EMAIL PROTECTED]> To:
> solr-user@lucene.apache.org> Subject: Re: Phrase Query Performance Question
> and score threshold> > On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote:> >
> If I limit the documents returned based on a score threshold (filter by
> score) will it be able to improve query performance?> > No.> > Taking a
> different approach can really speed up queries though.> To figure out what
> approach you should take, we need to know what you> are trying to do.> As
> Hoss said: http://people.apache.org/~hossman/#xyproblem> > > How many
> different phrase queries are you having performance issues with?> > -Yonik
Thanks for replying Yonik.
Out of my strong curiosity I was trying to implement a search application that
my colleague already did very successfully. I tried to to use SOLR to build the
same application and see if it works. Basically there are millions of
documents. They are categorized and the content of the document is constructed
by program using its category as input. A search application will search the
content and bring up the document. The way of constructing the document has
been proven to be excellent in terms of relevancy. Of course it rely on using
slop phrase queries. Now I want to build something that is able to search the
content and bring up the document fast. That is basically what I want to do.
I can't go any more detail on how the document content was constructed because
the company I work for has patent pending on it. I dare not to discuss it in
public. But the way it was constructed seems to be the reason of why document
frequency was so high (for many phrase) and a search usually bring up large
result set. But top score documents have very good relevancy. So I am facing
two issue. One is to make the slop phrase query faster, second is to make
result set smaller.
Using a score threshold may solve the second issue. That will be great if you
can point me how to achieve that.
As for the first issues. The number of different phrase queries have
performance issues I found so far are about 10. I believe there will be a lot
more I just haven't tried. It can be solve by using faster hard ware though.
Also I believe it will help if SOLR has samilar distributed search architecture
like NUTCH so that it can scale out instead of scale up.
Thanks a lot
Haishan
_________________________________________________________________
Help yourself to FREE treats served up daily at the Messenger Café. Stop by
today.
http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline