Update: Solr/Lucene 4.0 will incorporate a new fuzzy search algorithm with substantial performance improvements.
To tide us over until this release, we've simply rebuilt from source with a default prefix length of 2, which will suit our needs until then. On Wed, Jul 20, 2011 at 10:09 AM, Kyle Lee <randall.kyle....@gmail.com>wrote: > We're performing fuzzy searches on a field possessing a large number of > unique terms. Specifying a required minimum similarity of 0.7 results in a > query execution time of 13-15 seconds, which stands in stark contrast to our > average query time of 40ms. > > We suspect that the performance problem most likely emanates from the > enumeration over all the unique terms in the index. The Lucene documentation > for FuzzyQuery supports this theory with the following warning: > > *"Warning:* this query is not very scalable with its default prefix length > of 0 - in this case, *every* term will be enumerated and cause an edit score > calculation." > > We would therefore like to set the prefix length to one or two, mandating > that the first couple of characters match and thereby substantially reduce > the number of terms enumerated. Is this possible with Solr? I haven't yet > discovered a method, if so. Any help would be greatly appreciated. >