subject:"Why exactly are fuzzy queries so slow\?"

Why exactly are fuzzy queries so slow?

2007-11-24 Thread Timo Nentwig

Hi! I search an 1.5 gig index and fuzzy queries are really slow; something like avg. ~500ms (IndexSearcher.search(Query, HitCollector)). When performing exact queries I archieve response times <25ms. What is it that makes fuzzy queries so slow? Increased index access due to more terms, i.e. d

Re: Why exactly are fuzzy queries so slow?

2007-11-24 Thread Mathieu Lecarme

fuzzy are simply not indexed. If you wont to search quickly with fuzzy search, you should index word and their ngrams, it's the "do you mean" pattern. you first select used word wich share ngram with the query word, the distance is computed with levenstein, and you use this word as a synon

Re: Why exactly are fuzzy queries so slow?

2007-11-24 Thread markharw00d

The added IO is one factor. Another is the CPU load from doing many edit-distance comparisons between index terms and the provided search term. You can limit the number of edit distance comparisons conducted by setting the minimum prefix length. This is a property of the QueryParser if parsing

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread Timo Nentwig

On Saturday 24 November 2007 18:28:48 markharw00d wrote: > The added IO is one factor. Another is the CPU load from doing many > edit-distance comparisons between index terms and the provided search You mean FuzzyQuery.rewrite(). Are you sure this is a CPU and not an IO issue (reading the terms f

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread Timo Nentwig

On Saturday 24 November 2007 18:48:18 Mathieu Lecarme wrote: > fuzzy are simply not indexed. > If you wont to search quickly with fuzzy search, you should index word > and their ngrams, it's the "do you mean" pattern. replacing fuzzy with "did you mean" is indeed my favourite option however so fa

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread Timo Nentwig

On Saturday 24 November 2007 18:28:48 markharw00d wrote: > term. You can limit the number of edit distance comparisons conducted by > setting the minimum prefix length. This is a property of the QueryParser Well, javadoc: "prefixLength - length of common (non-fuzzy) prefix". So, this is some kind

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread markharw00d

For "fuzzy" you're going to pay one way or another. You can use ngram analyzers on indexed content and queries which will add IO costs ("files" becomes "fi","fil", "file","il","ile","iles" in both your query and index) or you can use some form of query-time edit distance comparison on "files" a

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread Mathieu Lecarme

Well, javadoc: "prefixLength - length of common (non-fuzzy) prefix". So, this is some kind of "wildcard fuzzy" but not real fuzzy anymore. I understand the optimitation but right now I hardly can image a reasonable use-case. Who care whether the levenstein distance is a the beginnen, middle

Re: Why exactly are fuzzy queries so slow?

2007-11-26 Thread Timo Nentwig

On Sunday 25 November 2007 11:54:15 markharw00d wrote: > For "fuzzy" you're going to pay one way or another. But which one is the cheapest? :) > You can use ngram analyzers on indexed content and queries which will > add IO costs ("files" becomes "fi","fil", "file","il","ile","iles" in > both you

Why exactly are fuzzy queries so slow?

Re: Why exactly are fuzzy queries so slow?

Re: Why exactly are fuzzy queries so slow?

Re: Why exactly are fuzzy queries so slow?

Re: Why exactly are fuzzy queries so slow?

Re: Why exactly are fuzzy queries so slow?

Re: Why exactly are fuzzy queries so slow?

Re: Why exactly are fuzzy queries so slow?

Re: Why exactly are fuzzy queries so slow?

9 matches

Site Navigation

Mail list logo

Footer information