Re: What's the bottleneck?

Jason Rennie Fri, 12 Sep 2008 11:18:10 -0700

Thanks for all the replies!

Mike: we're not using pf.  Our qf is always "status:0".  The "status" field
is "0" for all good docs (90%+) and some other integer for any docs we don't
want returned.

Jeyrl: federated search is definitely something we'll consider.

On Fri, Sep 12, 2008 at 8:39 AM, Grant Ingersoll <[EMAIL PROTECTED]>wrote:

> The bottleneck may simply be there are a lot of docs to score since you are
> using fairly common terms.

Yeah, I'm coming to the realization that it may be as simple as that.  Even
a short, simple query like "shirt" can take seconds to return, presumably
because it hits ("numFound") 2 million docs.

> Also, what file format (compound, non-compound) are you using?  Is it
> optimized?  Have you profiled your app for these queries?  When you say the
> "query is longer", define "longer"...  5 terms?  50 terms?  Do you have lots
> of deleted docs?  Can you share your DisMax params?  Are you doing wildcard
> queries?  Can you share the syntax of one of the offending queries?

I think we're using the non-compound format.  We see eight different files
(fdt, fdx, fnm, etc.) in an optimized index.  Yes, it's optimized.  It's
also read-only---we don't update/delete.  DisMax: we specify qf, fl, mm, fq;
mm=1; we use boosts for qf.  No wildcards.  Example query: "shirt"; takes 2
secs to run according to the solr log, hits 2 million docs.

> Since you want to keep "stopwords", you might consider a slightly better
> use of them, whereby you use them in n-grams only during query parsing.

Not sure what you mean here...

> See also https://issues.apache.org/jira/browse/LUCENE-494 for related
> stuff.
>

Thanks for the pointer.

Jason

Re: What's the bottleneck?

Reply via email to