> > Well, they’ll be exactly the same if (and only if) every document has a > tag. Otherwise, the > first one will exclude a doc that has no tag and the second one will > include it.
That's a good point/catch. How slow is “very slow”? > Well, in the case I was looking at it was about 10x slower but with the following caveats that there were 15 or so of these negative fq all some version of `fq={!cache=false}(tag:* -tag:<something>)` (*don't shoot me I didn't write it lol*) over 15 million documents. Which to me means that each fq was doing each step that you described below: The second form only has to index into the terms dictionary for the tag > field > value “email”, then zip down the posting list for all the docs that have > it. The > first form has to first identify all the docs that have a tag, accumulate > that list, > _then_ find the “email” value and zip down the postings list. > Thanks yet again Erick. That solidified in my mind how this works. Much appreciated! On Tue, Jul 14, 2020 at 7:22 AM Erick Erickson <erickerick...@gmail.com> wrote: > Yeah, there are optimizations there. BTW, these two queries are subtly > different. > > Well, they’ll be exactly the same if (and only if) every document has a > tag. Otherwise, the > first one will exclude a doc that has no tag and the second one will > include it. > > How slow is “very slow”? > > The second form only has to index into the terms dictionary for the tag > field > value “email”, then zip down the posting list for all the docs that have > it. The > first form has to first identify all the docs that have a tag, accumulate > that list, > _then_ find the “email” value and zip down the postings list. > > You could get around this if you require the first form functionality by, > say, > including a boolean field “has_tags”, then the first one would be > > fq=has_tags:true -tags:email > > Best, > Erick > > > On Jul 14, 2020, at 8:05 AM, Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > > > > Hi Chris, > > tag:* is a wildcard query while *:* is match all query. I believe that > adjusting pure negative is turned on by default so you can safely just use > -tag:email and it’ll be translated to *:* -tag:email. > > > > HTH, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection > > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > > > >> On 14 Jul 2020, at 14:00, Chris Dempsey <cdal...@gmail.com> wrote: > >> > >> I'm trying to understand the difference between something like > >> fq={!cache=false}(tag:* -tag:email) which is very slow compared to > >> fq={!cache=false}(*:* -tag:email) on Solr 7.7.1. > >> > >> I believe in the case of `tag:*` Solr spends some effort to gather all > of > >> the documents that have a value for `tag` and then removes those with > >> `-tag:email` while in the `*:*` Solr simply uses the document set as-is > >> and then remove those with `-tag:email` (*and I believe Erick mentioned > >> there were special optimizations for `*:*`*)? > > > >