Re: Optimizing fq query performance

Shawn Heisey Sun, 14 Apr 2019 12:14:27 -0700

On 4/13/2019 12:58 PM, John Davis wrote:

We noticed a sizable performance degradation when we add certain fq filters
to the query even though the result set does not change between the two
queries. I would've expected solr to optimize internally by picking the
most constrained fq filter first, but maybe my understanding is wrong.

All filters cover the entire index, unless the query parser that you'reusing implements the PostFilter interface, the filter cost is set highenough, and caching is disabled. All three of those conditions must bemet in order for a filter to only run on results instead of the entireindex.


http://yonik.com/advanced-filter-caching-in-solr/
https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/

Most query parsers don't implement the PostFilter interface. The luceneand edismax parsers do not implement PostFilter. Unless you'vespecified the query parser in the fq parameter, it will use the lucenequery parser, and it cannot be a PostFilter.

Here's an example:

query1: fq = 'field1:* AND field2:value'
query2: fq = 'field2:value'

If the point of the "field1:*" query clause is "make sure field1 existsin the document" then you would be a lot better off with this query clause:


field1:[* TO *]

This is an all-inclusive range query. It works with all field typeswhere I have tried it, and that includes TextField types. It will be alot more efficient than the wildcard query.

Here's what happens with "field1:*". If the cardinality of field1 isten million different values, then the query that gets constructed forLucene will literally contain ten million values. And every single oneof them will need to be compared to every document. That's a LOT ofcomparisons. Wildcard queries are normally very slow.


Thanks,
Shawn

Re: Optimizing fq query performance

Reply via email to