Can you clarify why field:[* TO *] is lot more efficient than field:* On Sun, Apr 14, 2019 at 12:14 PM Shawn Heisey <apa...@elyograg.org> wrote:
> On 4/13/2019 12:58 PM, John Davis wrote: > > We noticed a sizable performance degradation when we add certain fq > filters > > to the query even though the result set does not change between the two > > queries. I would've expected solr to optimize internally by picking the > > most constrained fq filter first, but maybe my understanding is wrong. > > All filters cover the entire index, unless the query parser that you're > using implements the PostFilter interface, the filter cost is set high > enough, and caching is disabled. All three of those conditions must be > met in order for a filter to only run on results instead of the entire > index. > > http://yonik.com/advanced-filter-caching-in-solr/ > https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/ > > Most query parsers don't implement the PostFilter interface. The lucene > and edismax parsers do not implement PostFilter. Unless you've > specified the query parser in the fq parameter, it will use the lucene > query parser, and it cannot be a PostFilter. > > > Here's an example: > > > > query1: fq = 'field1:* AND field2:value' > > query2: fq = 'field2:value' > > If the point of the "field1:*" query clause is "make sure field1 exists > in the document" then you would be a lot better off with this query clause: > > field1:[* TO *] > > This is an all-inclusive range query. It works with all field types > where I have tried it, and that includes TextField types. It will be a > lot more efficient than the wildcard query. > > Here's what happens with "field1:*". If the cardinality of field1 is > ten million different values, then the query that gets constructed for > Lucene will literally contain ten million values. And every single one > of them will need to be compared to every document. That's a LOT of > comparisons. Wildcard queries are normally very slow. > > Thanks, > Shawn >