Re: What's the bottleneck?

Otis Gospodnetic Fri, 12 Sep 2008 15:12:16 -0700

Jason, you could also post what the final query looks like (after dismax chews 
on it) - use &debugQuery=true and let's see if there is anything strange there.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Jason Rennie <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, September 12, 2008 2:17:28 PM
> Subject: Re: What's the bottleneck?
> 
> Thanks for all the replies!
> 
> Mike: we're not using pf.  Our qf is always "status:0".  The "status" field
> is "0" for all good docs (90%+) and some other integer for any docs we don't
> want returned.
> 
> Jeyrl: federated search is definitely something we'll consider.
> 
> On Fri, Sep 12, 2008 at 8:39 AM, Grant Ingersoll wrote:
> 
> > The bottleneck may simply be there are a lot of docs to score since you are
> > using fairly common terms.
> 
> 
> Yeah, I'm coming to the realization that it may be as simple as that.  Even
> a short, simple query like "shirt" can take seconds to return, presumably
> because it hits ("numFound") 2 million docs.
> 
> 
> > Also, what file format (compound, non-compound) are you using?  Is it
> > optimized?  Have you profiled your app for these queries?  When you say the
> > "query is longer", define "longer"...  5 terms?  50 terms?  Do you have lots
> > of deleted docs?  Can you share your DisMax params?  Are you doing wildcard
> > queries?  Can you share the syntax of one of the offending queries?
> 
> 
> I think we're using the non-compound format.  We see eight different files
> (fdt, fdx, fnm, etc.) in an optimized index.  Yes, it's optimized.  It's
> also read-only---we don't update/delete.  DisMax: we specify qf, fl, mm, fq;
> mm=1; we use boosts for qf.  No wildcards.  Example query: "shirt"; takes 2
> secs to run according to the solr log, hits 2 million docs.
> 
> 
> > Since you want to keep "stopwords", you might consider a slightly better
> > use of them, whereby you use them in n-grams only during query parsing.
> 
> 
> Not sure what you mean here...
> 
> 
> > See also https://issues.apache.org/jira/browse/LUCENE-494 for related
> > stuff.
> >
> 
> Thanks for the pointer.
> 
> Jason

Re: What's the bottleneck?

Reply via email to