Re: Understanding Negative Filter Queries

Chris Dempsey Tue, 14 Jul 2020 05:35:17 -0700

>
> Well, they’ll be exactly the same if (and only if) every document has a
> tag. Otherwise, the
> first one will exclude a doc that has no tag and the second one will
> include it.



That's a good point/catch.

How slow is “very slow”?
>

Well, in the case I was looking at it was about 10x slower but with the
following caveats that there were 15 or so of these negative fq all some
version of `fq={!cache=false}(tag:* -tag:<something>)` (*don't shoot me I
didn't write it lol*) over 15 million documents. Which to me means that
each fq was doing each step that you described below:

The second form only has to index into the terms dictionary for the tag
> field
> value “email”, then zip down the posting list for all the docs that have
> it. The
> first form has to first identify all the docs that have a tag, accumulate
> that list,
> _then_ find the “email” value and zip down the postings list.
>

Thanks yet again Erick. That solidified in my mind how this works. Much
appreciated!





On Tue, Jul 14, 2020 at 7:22 AM Erick Erickson <erickerick...@gmail.com>
wrote:

> Yeah, there are optimizations there. BTW, these two queries are subtly
> different.
>
> Well, they’ll be exactly the same if (and only if) every document has a
> tag. Otherwise, the
> first one will exclude a doc that has no tag and the second one will
> include it.
>
> How slow is “very slow”?
>
> The second form only has to index into the terms dictionary for the tag
> field
> value “email”, then zip down the posting list for all the docs that have
> it. The
> first form has to first identify all the docs that have a tag, accumulate
> that list,
> _then_ find the “email” value and zip down the postings list.
>
> You could get around this if you require the first form functionality by,
> say,
> including a boolean field “has_tags”, then the first one would be
>
> fq=has_tags:true -tags:email
>
> Best,
> Erick
>
> > On Jul 14, 2020, at 8:05 AM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
> >
> > Hi Chris,
> > tag:* is a wildcard query while *:* is match all query. I believe that
> adjusting pure negative is turned on by default so you can safely just use
> -tag:email and it’ll be translated to *:* -tag:email.
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> >> On 14 Jul 2020, at 14:00, Chris Dempsey <cdal...@gmail.com> wrote:
> >>
> >> I'm trying to understand the difference between something like
> >> fq={!cache=false}(tag:* -tag:email) which is very slow compared to
> >> fq={!cache=false}(*:* -tag:email) on Solr 7.7.1.
> >>
> >> I believe in the case of `tag:*` Solr spends some effort to gather all
> of
> >> the documents that have a value for `tag` and then removes those with
> >> `-tag:email` while in the `*:*` Solr simply uses the document set as-is
> >> and  then remove those with `-tag:email` (*and I believe Erick mentioned
> >> there were special optimizations for `*:*`*)?
> >
>
>

Re: Understanding Negative Filter Queries

Reply via email to