Re: Getting irrelevant results using fuzzy query

2008-06-28 Thread László Monda
! Cheers Mark - Original Message From: László Monda [EMAIL PROTECTED] To: java-user@lucene.apache.org Cc: [EMAIL PROTECTED] Sent: Monday, 23 June, 2008 1:11:50 PM Subject: Re: Getting irrelevant results using fuzzy query Thanks for your reply, Mark. This was my original code

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread László Monda
On Wed, 2008-06-18 at 21:10 +0200, Daniel Naber wrote: On Mittwoch, 18. Juni 2008, László Monda wrote: Additional info: Lucene seems to do the right thing when only few documents are present, but goes crazy when there is about 1.5 million documents in the index. Lucene works well with

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread László Monda
Hi Mark, On Wed, 2008-06-18 at 21:09 +0100, markharw00d wrote: This looks like it is related to an issue I first raised here: http://markmail.org/message/37ywsemfudpos6uh At the time I identified 2 issues with FuzzyQuery - that the usual coord and idf scoring factors shouldn't be

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread László Monda
Hi Daniel, On Wed, 2008-06-18 at 20:37 +0200, Daniel Naber wrote: On Mittwoch, 18. Juni 2008, László Monda wrote: Since fuzzy searching is based on the Levenshtein distance, the distance between coldplay and coldplay is 0 and the distance between coldplay and downplay is 3 so how on

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread mark harwood
] To: java-user@lucene.apache.org Cc: [EMAIL PROTECTED] Sent: Monday, 23 June, 2008 12:10:05 PM Subject: Re: Getting irrelevant results using fuzzy query On Wed, 2008-06-18 at 21:10 +0200, Daniel Naber wrote: On Mittwoch, 18. Juni 2008, László Monda wrote: Additional info: Lucene seems to do

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread László Monda
: László Monda [EMAIL PROTECTED] To: java-user@lucene.apache.org Cc: [EMAIL PROTECTED] Sent: Monday, 23 June, 2008 12:10:05 PM Subject: Re: Getting irrelevant results using fuzzy query On Wed, 2008-06-18 at 21:10 +0200, Daniel Naber wrote: On Mittwoch, 18. Juni 2008, László Monda wrote

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread mark harwood
[EMAIL PROTECTED] To: java-user@lucene.apache.org Cc: [EMAIL PROTECTED] Sent: Monday, 23 June, 2008 1:11:50 PM Subject: Re: Getting irrelevant results using fuzzy query Thanks for your reply, Mark. This was my original code for constructing my query using FuzzyQuery: BooleanQuery query = new

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread Daniel Naber
On Montag, 23. Juni 2008, László Monda wrote: According to the current Lucene documentation at http://lucene.apache.org/java/2_3_2/api/index.html it seems to me that the Query class doesn't have any explain() methods. It's in the IndexSearcher and it takes a query and a document number as its

Getting irrelevant results using fuzzy query

2008-06-18 Thread László Monda
Hi List, I've been redirected from [EMAIL PROTECTED] to here to discuss my issue. -- My original email -- I try to provide relevant results for the users of a lyrics site, even in the case of misspellings by indexing artist and songs with Lucene. The problem is that Lucene

Re: Getting irrelevant results using fuzzy query

2008-06-18 Thread Daniel Naber
On Mittwoch, 18. Juni 2008, László Monda wrote: Since fuzzy searching is based on the Levenshtein distance, the distance between coldplay and coldplay is 0 and the distance between coldplay and downplay is 3 so how on earth is possible that when searching for coldplay, Lucene returns

Re: Getting irrelevant results using fuzzy query

2008-06-18 Thread Daniel Naber
On Mittwoch, 18. Juni 2008, László Monda wrote: Additional info: Lucene seems to do the right thing when only few documents are present, but goes crazy when there is about 1.5 million documents in the index. Lucene works well with more documents (currently using it with 9 million). but the

Re: Getting irrelevant results using fuzzy query

2008-06-18 Thread markharw00d
This looks like it is related to an issue I first raised here: http://markmail.org/message/37ywsemfudpos6uh At the time I identified 2 issues with FuzzyQuery - that the usual coord and idf scoring factors shouldn't be applied to fuzzy queries. The coord factor got fixed but idf remains an