fuzzy queries

2013-02-09 Thread Pierre Antoine DuBoDeNa
> > Hello, > > I use lucene 3.6 and i try to use fuzzy queries so that I can match much > more results. > > I am adding for example these strings: > > list.add("string matching"); > > list.add("string123 matching"); > > list.add

Re: fuzzy queries

2013-02-09 Thread Michael McCandless
t; >> I use lucene 3.6 and i try to use fuzzy queries so that I can match much >> more results. >> >> I am adding for example these strings: >> >> list.add("string matching"); >> >> list.add("string123 matching"); >> >>

Re: fuzzy queries

2013-02-09 Thread Jack Krupansky
y: Match a document if EITHER term matches. So, if NEITHER matches (within an editing distance of 2), the document is not a match. -- Jack Krupansky -Original Message- From: Pierre Antoine DuBoDeNa Sent: Saturday, February 09, 2013 12:52 PM To: java-user@lucene.apache.org Subject: Re: fuz

Re: fuzzy queries

2013-02-09 Thread Pierre Antoine DuBoDeNa
ally: Match a document if EITHER term matches. So, if > NEITHER matches (within an editing distance of 2), the document is not a > match. > > -- Jack Krupansky > > -Original Message- From: Pierre Antoine DuBoDeNa > Sent: Saturday, February 09, 2013 12:52 PM > To: java-use

Re: fuzzy queries

2013-02-10 Thread Pierre Antoine DuBoDeNa
h an >> editing distance of 2 or less. >> >> Your query is essentially: Match a document if EITHER term matches. So, >> if NEITHER matches (within an editing distance of 2), the document is not a >> match. >> >> -- Jack Krupansky >> >> -Ori

Help with Fuzzy Queries

2008-03-06 Thread Eloi Rocha Neto
Hi, I am new with Lucene. I dont understand how Lucene works in some cases. For example: If I have an index with the following three entries: - ATUAÇÃO FALHA DE DISJUNTOR - RESET DE FALHA DE DISJUNTOR - FALHA DE COMANDO When I try to look for something limilar with "FALHA DE DI

Re: Help with Fuzzy Queries

2008-03-11 Thread Chris Hostetter
: When I try to look for something limilar with "FALHA DE DISJUNTOR", I've : got the following results: : Result | score : FALHA DE COMANDO | 0.9277342 : ATUAÇÃO FALHA DE DISJUNTOR | 0.8880876 : RESET DE FALHA DE DISJUNTOR | 0.5709133 your best bet to make sense of scoring i

Performance improvements for fuzzy queries ?

2012-02-03 Thread Paul Taylor
this seems wrong because this still seem relevent, and more problematically the fuzzy query scores are so much lower than normal and phrase matches, so it doesn't seem to work when using fuzzy queries mixed in with other queries, is there a better option or even some better documentat

Re: Performance improvements for fuzzy queries ?

2012-03-08 Thread Paul Taylor
or field/norm of the matching document this seems wrong because this still seem relevent, and more problematically the fuzzy query scores are so much lower than normal and phrase matches, so it doesn't seem to work when using fuzzy queries mixed in with other queries, is there a better o

FastSSFuzzy for faster fuzzy queries in Lucene

2009-01-05 Thread Jason Rutherglen
Hello, I'm interested in getting FastSSFuzzy into Lucene, perhaps as a contrib module. One question is how much would the index grow? We've got a list of people's names we want to do spellchecking on for example. -J

Why exactly are fuzzy queries so slow?

2007-11-24 Thread Timo Nentwig
Hi! I search an 1.5 gig index and fuzzy queries are really slow; something like avg. ~500ms (IndexSearcher.search(Query, HitCollector)). When performing exact queries I archieve response times <25ms. What is it that makes fuzzy queries so slow? Increased index access due to more terms,

Re: FastSSFuzzy for faster fuzzy queries in Lucene

2009-01-05 Thread Grant Ingersoll
Do you have a reference paper/link on it? Sounds interesting. On Jan 5, 2009, at 8:17 PM, Jason Rutherglen wrote: Hello, I'm interested in getting FastSSFuzzy into Lucene, perhaps as a contrib module. One question is how much would the index grow? We've got a list of people's names we

Re: FastSSFuzzy for faster fuzzy queries in Lucene

2009-01-05 Thread Robert Muir
hi, although i've been trying to get my code into shape to upload to jira (holidays got in the way a bit), I guess i think there might be some issues making my implementation work for general use. i based my design on certain assumptions, such as the fact I don't update indexes. once my index is

Re: FastSSFuzzy for faster fuzzy queries in Lucene

2009-01-06 Thread Robert Muir
hi, yes, the results that come back from the lucene index i verify at runtime before expanding the query. i considered trying to store delete positions as payloads or something but fastssWC is good enough for me. i'll see about posting my code today. On Tue, Jan 6, 2009 at 4:52 AM, Thomas Bocek

Re: FastSSFuzzy for faster fuzzy queries in Lucene

2009-01-06 Thread Glen Newton
- Fast Similarity Search in Large Dictionaries. http://fastss.csg.uzh.ch/ - Paper: Fast Similarity Search in Large Dictionaries. http://fastss.csg.uzh.ch/ifi-2007.02.pdf - FastSimilarSearch.java http://fastss.csg.uzh.ch/FastSimilarSearch.java - Paper: Fast Similarity Search in Peer-to-Peer Networks

Re: Why exactly are fuzzy queries so slow?

2007-11-24 Thread Mathieu Lecarme
this word as a synonym. M. Le 24 nov. 07 à 17:36, Timo Nentwig a écrit : Hi! I search an 1.5 gig index and fuzzy queries are really slow; something like avg. ~500ms (IndexSearcher.search(Query, HitCollector)). When performing exact queries I archieve response times <25ms. What is i

Re: Why exactly are fuzzy queries so slow?

2007-11-24 Thread markharw00d
The added IO is one factor. Another is the CPU load from doing many edit-distance comparisons between index terms and the provided search term. You can limit the number of edit distance comparisons conducted by setting the minimum prefix length. This is a property of the QueryParser if parsing

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread Timo Nentwig
On Saturday 24 November 2007 18:28:48 markharw00d wrote: > The added IO is one factor. Another is the CPU load from doing many > edit-distance comparisons between index terms and the provided search You mean FuzzyQuery.rewrite(). Are you sure this is a CPU and not an IO issue (reading the terms f

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread Timo Nentwig
eed my favourite option however so far I don't know how to do it (in my case). Are there some examples to look at? I think one of the problem with a fuzzy queries is that it searches for all terms that match the given levenstein distance. I doesn't care whether a particular term

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread Timo Nentwig
On Saturday 24 November 2007 18:28:48 markharw00d wrote: > term. You can limit the number of edit distance comparisons conducted by > setting the minimum prefix length. This is a property of the QueryParser Well, javadoc: "prefixLength - length of common (non-fuzzy) prefix". So, this is some kind

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread markharw00d
For "fuzzy" you're going to pay one way or another. You can use ngram analyzers on indexed content and queries which will add IO costs ("files" becomes "fi","fil", "file","il","ile","iles" in both your query and index) or you can use some form of query-time edit distance comparison on "files" a

Re: Why exactly are fuzzy queries so slow?

2007-11-25 Thread Mathieu Lecarme
Well, javadoc: "prefixLength - length of common (non-fuzzy) prefix". So, this is some kind of "wildcard fuzzy" but not real fuzzy anymore. I understand the optimitation but right now I hardly can image a reasonable use-case. Who care whether the levenstein distance is a the beginnen, middle

Re: Why exactly are fuzzy queries so slow?

2007-11-26 Thread Timo Nentwig
On Sunday 25 November 2007 11:54:15 markharw00d wrote: > For "fuzzy" you're going to pay one way or another. But which one is the cheapest? :) > You can use ngram analyzers on indexed content and queries which will > add IO costs ("files" becomes "fi","fil", "file","il","ile","iles" in > both you

Wildcard and Fuzzy queries - no best fragments generated - ??

2005-12-27 Thread Dmitry Goldenberg
Hello, While testing my code that integrates the Highlighter class from org.apache.lucene.search.highlight, I found out that for wildcard and fuzzy queries, it generates no best fragments. Any particular reason why that is the case? Shouldn't the highlighter be able to work just like

Wildcard and Fuzzy queries - no best fragments generated - ??

2005-12-27 Thread Dmitry Goldenberg
Hello, While testing my code that integrates the Highlighter class from org.apache.lucene.search.highlight, I found out that for wildcard and fuzzy queries, it generates no best fragments. Any particular reason why that is the case? Shouldn't the highlighter be able to work just like

Re: Wildcard and Fuzzy queries - no best fragments generated - ??

2005-12-27 Thread Erik Hatcher
PM, Dmitry Goldenberg wrote: Hello, While testing my code that integrates the Highlighter class from org.apache.lucene.search.highlight, I found out that for wildcard and fuzzy queries, it generates no best fragments. Any particular reason why that is the case? Shouldn't the highli

RE: Wildcard and Fuzzy queries - no best fragments generated - ??

2005-12-27 Thread Dmitry Goldenberg
EMAIL PROTECTED] Sent: Tue 12/27/2005 11:03 AM To: java-user@lucene.apache.org Subject: Re: Wildcard and Fuzzy queries - no best fragments generated - ?? You have to _rewrite_ the Query for this to work. This, I believe, is mentioned in the javadocs. I think you are hijacking a thread with your r

Re: Wildcard and Fuzzy queries - no best fragments generated - ??

2005-12-27 Thread Erik Hatcher
a "..." String result = highlighter.getBestFragments(tokenStream, text, 3, "..."); System.out.println(result); } From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tue 12/27/2005 11:03 AM To: java-user@lucene.apache.org Subjec

RE: Wildcard and Fuzzy queries - no best fragments generated - ??

2005-12-27 Thread Dmitry Goldenberg
ik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tue 12/27/2005 12:13 PM To: java-user@lucene.apache.org Subject: Re: Wildcard and Fuzzy queries - no best fragments generated - ?? On Dec 27, 2005, at 2:34 PM, Dmitry Goldenberg wrote: > What do you mean by _rewriting_ the query? I checked all the >

Re: Wildcard and Fuzzy queries - no best fragments generated - ??

2005-12-27 Thread Erik Hatcher
ldcard and Fuzzy queries - no best fragments generated - ?? On Dec 27, 2005, at 2:34 PM, Dmitry Goldenberg wrote: What do you mean by _rewriting_ the query? I checked all the classes in the highlighter package and did not see any mention of having to rewrite. From Highlighter's packa

What is edit distance 2 mean for fuzzy queries?

2019-06-28 Thread baris . kazar
HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED STATES" Name: Marblehead Dr Score: 28.291311 ID: 12762505 Country Code: US Coordinates: 42.79743, -71.50919 Search Key: street="MARBLEHEAD" city="NASHUA" municipality="HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED STATES" RID is two edit distances away from RIDGE , right? Should i enable something during indexing for fuzzy queries? Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How does lucene handle the wildcard and fuzzy queries ?

2012-11-27 Thread Jack Krupansky
sues related to "internals" aren't appropriate on "user" lists. -- Jack Krupansky -Original Message- From: sri krishna Sent: Tuesday, November 27, 2012 12:36 PM To: java-user@lucene.apache.org Subject: How does lucene handle the wildcard and fuzzy queries ? How does lu

Re: What is edit distance 2 mean for fuzzy queries?

2019-06-28 Thread baris . kazar
"NEW HAMPSHIRE" country="UNITED STATES" Name: Hartford Ln Score: 28.291311 ID: 9817672 Country Code: US Coordinates: 42.78252, -71.49689 Search Key: street="HARTFORD" city="NASHUA" municipality="HILLSBOROUGH" region="NEW HAMPSHIRE"

Re: What is edit distance 2 mean for fuzzy queries?

2019-08-04 Thread Furkan KAMACI
HILLSBOROUGH" region="NEW > > HAMPSHIRE" country="UNITED STATES" > > > > Name: Pennichuck St > > Score: 28.291311 > > ID: 8022314 > > Country Code: US > > Coordinates: 42.79266, -71.46672 > > Search Key: street="PENNICHUCK"

Re: What is edit distance 2 mean for fuzzy queries?

2019-08-05 Thread Baris Kazar
...@gmail.com To: java-user@lucene.apache.org Sent: Sunday, August 4, 2019 8:41:27 PM GMT -05:00 US/Canada Eastern Subject: Re: What is edit distance 2 mean for fuzzy queries? Hi Baris, Terms of length 1 or 2 will sometimes not match because of how the scaled distance between two terms is computed