Slow queries with lots of hits

2008-12-04 Thread Tim Sturge
Hi all, I have an interesting problem with my query traffic. Most of the queries run in a fairly short amount of time (< 100ms) but a few take over 1000ms. These queries are predominantly those with a huge number of hits (>1 million hits in a >100 million document index). The time taken (as far as

Re: Slow queries with lots of hits

2008-12-04 Thread Erick Erickson
The problem here is how *could* a system return even the top 10,000 results without scoring them all? What if the millionth hit resulted in the very best match in the entire corpus? That said, sorting may well be the issue here rather than scoring. You can use a TopDocCollector to get the top N ma

Re: Slow queries with lots of hits

2008-12-04 Thread Tim Sturge
That makes sense. I should be more precise in that all I need is 100 of the 1 "reasonable" results. The concern I would have with a TopDocCollector is that this is biased towards the top of the index which translates for me into a bias for older documents. I'd prefer no age bias or a newer doc

Re: Slow queries with lots of hits

2008-12-04 Thread Erick Erickson
Huh? TopDocCollector isn't biased unless you suppose that you'll have many documents scoring *exactly* the same. You collect the top N scoring documents. Actually, I think this is all pretty much done for you with the Searcher.search(Query query, Filter filter, int n) method. You can pass null for

Re: Slow queries with lots of hits

2008-12-04 Thread John Wang
Tim: How about implementing your own HitCollector and stop when you have collected 100 docs with score above certain threshold? BTW, are there lotsa concurrent searches? -John On Thu, Dec 4, 2008 at 12:52 PM, Tim Sturge <[EMAIL PROTECTED]> wrote: > That makes sense. I should be more p

Re: Slow queries with lots of hits

2008-12-04 Thread Otis Gospodnetic
can come up with a generic way to do 1) ? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Tim Sturge <[EMAIL PROTECTED]> > To: "java-user@lucene.apache.org" > Sent: Thursday, December 4, 2008 3:27:30 PM > Subjec

Re: Slow queries with lots of hits

2008-12-04 Thread Karl Wettin
Hi Tim, is it possible that the slow queries contains terms that are very common in your index? If so you could replace those clauses with a filter. This would impact the score as filters does nothing with that, but if your query contains enough other clauses that should not be a problem.

Re: Slow queries with lots of hits

2008-12-05 Thread Tim Sturge
Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Tim Sturge <[EMAIL PROTECTED]> >> To: "java-user@lucene.apache.org" >> Sent: Thursday, December 4, 2008 3:27:30 PM >> Subject: Slow queri