Re: String similarity search vs. typcial IR application...

2003-06-06 Thread Leo Galambos
AFAIK Lucene is not able to look DNA strings up effectively. You would use DASG+Lev (see my previous post - 05/30/2003 1916CEST). -g- Jim Hargrave wrote: Our application is a string similarity searcher where the query is an input string and we want to find all "fuzzy" variants of the input stri

Re: String similarity search vs. typcial IR application...

2003-06-06 Thread Jim Hargrave
Probably shouldn't have added that last bit. Our app isn't a DNA searcher. But DASG+Lev does look interesting. Our app is a linguistic application. We want to search for sentences which have many ngrams in common and rank them based on the score below. Similar to the TELLTALE system (do a goog

Re: String similarity search vs. typcial IR application...

2003-06-06 Thread Leo Galambos
I see. Are you looking for this: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html On the other hand, if n is not fixed, you still have a problem. As far as I read this list it seems, that Lucene reads a dictionary (of terms) into memory, and it also allocates o

RE: String similarity search vs. typcial IR application...

2003-06-06 Thread Frank Burough
05, 2003 5:55 PM > To: Lucene Users List > Subject: Re: String similarity search vs. typcial IR application... > > > AFAIK Lucene is not able to look DNA strings up effectively. > You would > use DASG+Lev (see my previous post - 05/30/2003 1916CEST). > > -g- > &g

Re: String similarity search vs. typcial IR application...

2003-06-06 Thread Leo Galambos
ased on this approach. I don't know if it ever left the lab and made it into the mainstream. If I have time I will explore this a bit. Frank Burough -Original Message- From: Leo Galambos [mailto:[EMAIL PROTECTED] Sent: Thursday, June 05, 2003 5:55 PM To: Lucene Users List Subje

RE: String similarity search vs. typcial IR application...

2003-06-06 Thread Frank Burough
To: Lucene Users List > Subject: Re: String similarity search vs. typcial IR application... > > > Exact matches are not ideal for DNA applications, I guess. I am not a > DNA expert, but those guys often need a feature that is termed > ``fuzzy''[*] in Lucene. They need Leven

Re: String similarity search vs. typcial IR application...

2003-06-06 Thread Ype Kingma
On Thursday 05 June 2003 14:12, Jim Hargrave wrote: > Our application is a string similarity searcher where the query is an input > string and we want to find all "fuzzy" variants of the input string in the > DB. The Score is basically dice's coefficient: 2C/Q+D, where C is the > number of terms (