@Erick, You've got the idea. Basically the users can attach zero or more tags (*that they create*) to a document. So as an example say they've created the tags (this example is just a small subset of the total tags):
- paid - invoice-paid - ms-reply-unpaid-2019 - credit-ms-reply-unpaid - ms-reply-paid-2019 - ms-reply-paid-2020 and attached them in various combinations to documents. They then want to find all documents by tag that don't contain the characters "paid" anywhere in the tag, don't contain tags with the characters "ms-reply-unpaid", but do include documents tagged with the characters "ms-reply-paid". The obvious suggestion would be to have the users just use the entire tag (i.e. don't let them do a "contains") as a condition to eliminate the wildcards - which would work - but unfortunately we have customers with (*not joking*) over 100K different tags (*why have a taxonomy like that is yet a different issue*). I'm willing to accept that in our scenario n-grams might be the Solr-based answer (the other being to change what "contains" means within our application) but thought I'd check I hadn't overlooked any other options. :) On Mon, Jun 29, 2020 at 3:54 PM Mikhail Khludnev <m...@apache.org> wrote: > Hello, Chris. > I suppose index time analysis can yield these terms: > "paid","ms-reply-unpaid","ms-reply-paid", and thus let you avoid these > expensive wildcard queries. Here's why it's worth to avoid them > https://www.slideshare.net/lucidworks/search-like-sql-mikhail-khludnev-epam > > On Mon, Jun 29, 2020 at 6:17 PM Chris Dempsey <cdal...@gmail.com> wrote: > > > Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*) > but > > I'm looking into options for optimizing something like this: > > > > > fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR > > tag:*ms-reply-paid* > > > > It's probably not a surprise that we're seeing performance issues with > > something like this. My understanding is that using the wildcard on both > > ends forces a full-text index search. Something like the above can't take > > advantage of something like the ReverseWordFilter either. I believe > > constructing `n-grams` is an option (*at the expense of index size*) but > is > > there anything I'm overlooking as a possible avenue to look into? > > > > > -- > Sincerely yours > Mikhail Khludnev >