On 4/13/2019 12:58 PM, John Davis wrote:
We noticed a sizable performance degradation when we add certain fq filters
to the query even though the result set does not change between the two
queries. I would've expected solr to optimize internally by picking the
most constrained fq filter first, but maybe my understanding is wrong.

All filters cover the entire index, unless the query parser that you're using implements the PostFilter interface, the filter cost is set high enough, and caching is disabled. All three of those conditions must be met in order for a filter to only run on results instead of the entire index.

http://yonik.com/advanced-filter-caching-in-solr/
https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/

Most query parsers don't implement the PostFilter interface. The lucene and edismax parsers do not implement PostFilter. Unless you've specified the query parser in the fq parameter, it will use the lucene query parser, and it cannot be a PostFilter.

Here's an example:

query1: fq = 'field1:* AND field2:value'
query2: fq = 'field2:value'

If the point of the "field1:*" query clause is "make sure field1 exists in the document" then you would be a lot better off with this query clause:

field1:[* TO *]

This is an all-inclusive range query. It works with all field types where I have tried it, and that includes TextField types. It will be a lot more efficient than the wildcard query.

Here's what happens with "field1:*". If the cardinality of field1 is ten million different values, then the query that gets constructed for Lucene will literally contain ten million values. And every single one of them will need to be compared to every document. That's a LOT of comparisons. Wildcard queries are normally very slow.

Thanks,
Shawn

Reply via email to