Re: Manipulating a Fuzzy Query's Prefix Length

Kyle Lee Wed, 20 Jul 2011 20:39:31 -0700

Update:

Solr/Lucene 4.0 will incorporate a new fuzzy search algorithm with
substantial performance improvements.


To tide us over until this release, we've simply rebuilt from source with a
default prefix length of 2, which will suit our needs until then.

On Wed, Jul 20, 2011 at 10:09 AM, Kyle Lee <randall.kyle....@gmail.com>wrote:

> We're performing fuzzy searches on a field possessing a large number of
> unique terms. Specifying a required minimum similarity of 0.7 results in a
> query execution time of 13-15 seconds, which stands in stark contrast to our
> average query time of 40ms.
>
> We suspect that the performance problem most likely emanates from the
> enumeration over all the unique terms in the index. The Lucene documentation
> for FuzzyQuery supports this theory with the following warning:
>
> *"Warning:* this query is not very scalable with its default prefix length
> of 0 - in this case, *every* term will be enumerated and cause an edit score
> calculation."
>
> We would therefore like to set the prefix length to one or two, mandating
> that the first couple of characters match and thereby substantially reduce
> the number of terms enumerated. Is this possible with Solr? I haven't yet
> discovered a method, if so. Any help would be greatly appreciated.
>

Re: Manipulating a Fuzzy Query's Prefix Length

Reply via email to