Re: Understanding Negative Filter Queries

Erick Erickson Tue, 14 Jul 2020 07:09:39 -0700

There’s another possibility if the person I _should_ shoot who
wrote the query can’t change it; add cost=101 and turn it
into a post-filter. It’s not clear to me how much difference
that’d make, but it might be worth a shot, see:


https://yonik.com/advanced-filter-caching-in-solr-2/

Best,
Erick

> On Jul 14, 2020, at 8:33 AM, Chris Dempsey <cdal...@gmail.com> wrote:
> 
>> 
>> Well, they’ll be exactly the same if (and only if) every document has a
>> tag. Otherwise, the
>> first one will exclude a doc that has no tag and the second one will
>> include it.
> 
> 
> That's a good point/catch.
> 
> How slow is “very slow”?
>> 
> 
> Well, in the case I was looking at it was about 10x slower but with the
> following caveats that there were 15 or so of these negative fq all some
> version of `fq={!cache=false}(tag:* -tag:<something>)` (*don't shoot me I
> didn't write it lol*) over 15 million documents. Which to me means that
> each fq was doing each step that you described below:
> 
> The second form only has to index into the terms dictionary for the tag
>> field
>> value “email”, then zip down the posting list for all the docs that have
>> it. The
>> first form has to first identify all the docs that have a tag, accumulate
>> that list,
>> _then_ find the “email” value and zip down the postings list.
>> 
> 
> Thanks yet again Erick. That solidified in my mind how this works. Much
> appreciated!
> 
> 
> 
> 
> 
> On Tue, Jul 14, 2020 at 7:22 AM Erick Erickson <erickerick...@gmail.com>
> wrote:
> 
>> Yeah, there are optimizations there. BTW, these two queries are subtly
>> different.
>> 
>> Well, they’ll be exactly the same if (and only if) every document has a
>> tag. Otherwise, the
>> first one will exclude a doc that has no tag and the second one will
>> include it.
>> 
>> How slow is “very slow”?
>> 
>> The second form only has to index into the terms dictionary for the tag
>> field
>> value “email”, then zip down the posting list for all the docs that have
>> it. The
>> first form has to first identify all the docs that have a tag, accumulate
>> that list,
>> _then_ find the “email” value and zip down the postings list.
>> 
>> You could get around this if you require the first form functionality by,
>> say,
>> including a boolean field “has_tags”, then the first one would be
>> 
>> fq=has_tags:true -tags:email
>> 
>> Best,
>> Erick
>> 
>>> On Jul 14, 2020, at 8:05 AM, Emir Arnautović <
>> emir.arnauto...@sematext.com> wrote:
>>> 
>>> Hi Chris,
>>> tag:* is a wildcard query while *:* is match all query. I believe that
>> adjusting pure negative is turned on by default so you can safely just use
>> -tag:email and it’ll be translated to *:* -tag:email.
>>> 
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
>>>> On 14 Jul 2020, at 14:00, Chris Dempsey <cdal...@gmail.com> wrote:
>>>> 
>>>> I'm trying to understand the difference between something like
>>>> fq={!cache=false}(tag:* -tag:email) which is very slow compared to
>>>> fq={!cache=false}(*:* -tag:email) on Solr 7.7.1.
>>>> 
>>>> I believe in the case of `tag:*` Solr spends some effort to gather all
>> of
>>>> the documents that have a value for `tag` and then removes those with
>>>> `-tag:email` while in the `*:*` Solr simply uses the document set as-is
>>>> and  then remove those with `-tag:email` (*and I believe Erick mentioned
>>>> there were special optimizations for `*:*`*)?
>>> 
>> 
>>

Re: Understanding Negative Filter Queries

Reply via email to