Re: Optimizing fq query performance

Erick Erickson Sun, 14 Apr 2019 08:20:05 -0700

Patches welcome, but how would that be done? There’s no fixed schema at the 
Lucene level. It’s even possible  that no two documents in the index have any 
fields in common. Given the structure of an inverted index, answering the 
question “for document X does it have any value?" is rather “interesting”. You 
might be able to do something with docValues and function queries, but that’s 
overkill.


In some sense, fq=field:* does this dynamically by putting the results in the 
filterCache where it requires no calculations the next time so it seems like 
more effort than it’s worth.

Best,
Erick

> On Apr 13, 2019, at 11:24 PM, John Davis <johndavis925...@gmail.com> wrote:
> 
>> field1:* is slow in general for indexed fields because all terms for the
>> field need to be iterated (e.g. does term1 match doc1, does term2 match
>> doc1, etc)
> 
> This feels like something could be optimized internally by tracking
> existence of the field in a doc instead of making users index yet another
> field to track existence?
> 
> BTW does this same behavior apply for tlong fields too where the value
> might be more continuous vs discrete strings?
> 
> On Sat, Apr 13, 2019 at 12:30 PM Yonik Seeley <ysee...@gmail.com> wrote:
> 
>> More constrained but matching the same set of documents just guarantees
>> that there is more information to evaluate per document matched.
>> For your specific case, you can optimize fq = 'field1:* AND field2:value'
>> to &fq=field1:*&fq=field2:value
>> This will at least cause field1:* to be cached and reused if it's a common
>> pattern.
>> field1:* is slow in general for indexed fields because all terms for the
>> field need to be iterated (e.g. does term1 match doc1, does term2 match
>> doc1, etc)
>> One can optimize this by indexing a term in a different field to turn it
>> into a single term query (i.e. exists:field1)
>> 
>> -Yonik
>> 
>> On Sat, Apr 13, 2019 at 2:58 PM John Davis <johndavis925...@gmail.com>
>> wrote:
>> 
>>> Hi there,
>>> 
>>> We noticed a sizable performance degradation when we add certain fq
>> filters
>>> to the query even though the result set does not change between the two
>>> queries. I would've expected solr to optimize internally by picking the
>>> most constrained fq filter first, but maybe my understanding is wrong.
>>> Here's an example:
>>> 
>>> query1: fq = 'field1:* AND field2:value'
>>> query2: fq = 'field2:value'
>>> 
>>> If we assume that the result set is identical between the two queries and
>>> field1 is in general more frequent in the index, we noticed query1 takes
>>> 100x longer than query2. In case it matters field1 is of type tlongs
>> while
>>> field2 is a string.
>>> 
>>> Any tips for optimizing this?
>>> 
>>> John
>>> 
>>

Re: Optimizing fq query performance

Reply via email to