Re: Phrase Query Performance Question

Mike Klaas Thu, 01 Nov 2007 10:27:25 -0800

On 31-Oct-07, at 11:54 PM, Haishan Chen wrote:

Date: Wed, 31 Oct 2007 17:54:53 -0700> Subject: Re: Phrase QueryPerformance Question> From: [EMAIL PROTECTED]> To: solr-[EMAIL PROTECTED]> > "hurricane katrina" is a very expensivequery against a collection> focused on Hurricane Katrina. Therewill be many matches in many> documents. If you want to measureworst-case, this is fine.> > I'd try other things, like:> > *ninth ward> * Ray Nagin> * Audubon Park> * Canal Street> * FrenchQuarter> * FEMA mistakes> * storm surge> * Jackson Square> > Ofcourse, real query logs are the only real test.> > wunder
These terms are not frequent in my index. I believe they are goingto be fast. The thing is that I feel 2 million documents is a smallindex.100,000 or 200,000 hits is a small set and should always have subsecond query performance. Now I am only querying one field and theresponse is almost one second. I feel I can't achieve sub secondperformance if I add a bit more complexity to the query.
Many of the category terms in my index will appear in more than 5%of the documents and those category terms are very popular search
terms. So the example I gave were not extreme cases for my index

I think that you are somewhat misguided about what constitutes asmall set. A query term that appears in 5-10% of the index in anatural language corpus is _extremely_ frequent. Not quite on theorder of stopwords, but getting there. As a comparison, on anextremely large corpus that I have handy, documents containing boththe word 'auto' and 'repair' (not necessarily adjacent) constitute0.1% of the index. The frequency of the phrase "auto repair" is 0.025%.


@200k docs would be the response rate from an 800million-doc corpus.

What data are you indexing, what what is the intended effect of thephrase queries you are performing? Perhaps getting at the issue fromthis end would be more productive than hammering at the phrasequeryperformance question.

When I start tomcat I saw this message:
The Apache Tomcat Native library which allows optimal performancein production environments was not found on the java.library.path
Is that mean if I use Apache Tomcat Native library the queryperformance will be better. Anyone has experience on that?

Unlikely, though it might help you slightly at a high query rate withhigh cache hit ratios.


-Mike

Re: Phrase Query Performance Question

Reply via email to