Re: Leading wildcards

Michael Kimsal Thu, 19 Apr 2007 07:34:40 -0700

Agreed, but in our tests (100M index) it wasn't a performance hit, and much
better (as in it actually worked) than MSSQL  ;)




On 4/19/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:

On Apr 19, 2007, at 6:56 AM, Michael Kimsal wrote:
> It's bugged us a little bit, because it's something that we need
> (to be able to emulate the previous foo LIKE '%bar%' SQL behaviour
> we're
> replacing), but can't offer our users yet.

I have also run into this issue and have intended to fix up Solr to
allow configuring that switch on QueryParser.  I'll eventually get to
this, but someone supply a patch with a test case would get it done
sooner.

I must, however, caveat discussion of leading wildcards with the
underlying effect you get.  If you use standard analysis and perform
a leading wildcard query, you incur a (possibly) dramatic hit in
terms of performance.  Lucene has to scan *every* term in the
specified field.  In fact, with my 3.7M index, a fuzzy query for the
very same reason, kills the query.  There is also a switch on fuzzy
query that needs to be configurable through Solr, to adjust the
number of leading characters that are fixed to avoid this all term
scanning.

There are techniques that can be used to improve the performance of
in-string types of queries like this, at the expense of indexing time
and size and clever query creation.   One such technique I've used
successfully is term rotation enumeration (cat => cat$, at$c, t
$ca).   This involves custom analyzers and query creation.

Once Solr supports this switch, you may find performance fine with
leading wildcard queries, but at least be forewarned that there are
scalability skeletons in this closet.

        Erik



--
Michael Kimsal
http://webdevradio.com

Re: Leading wildcards

Reply via email to