First of all, wildcards are evil. Be sure that the reason people are
using wildcards wouldn't be better served by proper tokenizing,
perhaps something like stemming etc.

Assuming that wildcards must be handled though, there are two main strategies:
1> if you want to use leading wildcards, look at
ReverseWildcardFilterFactory. For something like abc* (trailing
wildcard), conceptually Lucene has to construct a big OR query of
every term that starts with "abc". That's not hard and is also pretty
fast, just jump to the first term that starts with "abc" and gather
all of them (they're sorted lexicaly) until you get to the first term
starting with "abd".

_Leading_ wildcards are a whole 'nother story. *abc means that each
and every distinct term in the field must be enumerated. The first
term could be aaaaaaaaabc and the last term in the field zzzzzzzabc.
There's no way to tell without checking every one.
ReverseWildcardFilterFactory handles indexing the term, well, reversed
so the above example not only would the term aaaaaaaaabc bb indexed,
but also cbaaaaaaaaa. Now both leading and trailing wildcards are
automagically made into trailing wildcards.

2> If you must allow leading and trailing wildcards on the same term
*abc*, consider ngramming, bigrams are usually sufficient. So aaabcde
is indexed as aa, aa, ab, bd, de and searching for *abc* becomes
searching for "ab bc".

Both of these make the index larger, but usually by surprisingly
little. People will also index these variants in separate fields upon
occasion, it depends on the use-cases needed to support. Ngramming for
instance would find "ab" in the above (no wildcards)....

Best,
Erick
On Sun, Sep 9, 2018 at 1:40 PM John Blythe <johnbly...@gmail.com> wrote:
>
> hi all. we just migrated to cloud on friday night (woohoo!). everything is
> looking good (great!) overall. we did, however, just run into a hiccup.
> running a query like this got us a 504 gateway time-out error:
>
> **some* *foo* *bar* *query**
>
> it was about 6 partials with encapsulating wildcards that someone was
> running that gave the error. doing 4 or 5 of them worked fine, but upon
> adding the last one or two it went caput. all operations have been zippier
> since the migration before doing some of those wildcard queries which took
> time (if they worked at all). is this something related directly w our
> server configuration or is there some solr/cloud config'ing that we could
> work on that would allow better response to these sorts of queries (though
> it'd be at a cost, i'd imagine!).
>
> thanks for any insight!
>
> best,
>
> --
> John Blythe

Reply via email to