Re: How best to compare tow sentences

2014-12-05 Thread Paul Taylor
On 05/12/2014 01:25, Chris Hostetter wrote: : For a number of years I've been doing this for some time by creating a : RAMDirectory, creating a document for one of the sentence and then doing a : search using the other sentence and seeing if we get a good match. This has : worked reasonably well

Re: How best to compare tow sentences

2014-12-04 Thread Chris Hostetter
: For a number of years I've been doing this for some time by creating a : RAMDirectory, creating a document for one of the sentence and then doing a : search using the other sentence and seeing if we get a good match. This has : worked reasonably well but since improving the performance of other

RE: How best to compare tow sentences

2014-12-04 Thread Oliver Christ
, December 03, 2014 11:49 AM To: java-user@lucene.apache.org; paul_t...@fastmail.fm Subject: Re: How best to compare tow sentences There are various implementations of Damerau-Levenshtein online. I don't know how much it will improve your results however. Why are you not indexing all o

Re: How best to compare tow sentences

2014-12-04 Thread Barry Coughlan
There are various implementations of Damerau-Levenshtein online. I don't know how much it will improve your results however. Why are you not indexing all of the strings? If you don't have to compute all possible pairs, then you are better off without Lucene. Note that the cosine similarity calcul

Re: How best to compare tow sentences

2014-12-04 Thread parnab kumar
Hi, If you are comparing two song titles which are usually very short you are better of using custom set of several features rather than using one of cosine or levenstein or jaccard. You may use the combination of the following: 1. cosine sim score 2. Jaccard overlap coeff 3. how many words in th

Re: How best to compare tow sentences

2014-12-03 Thread Shashi Kant
Paul, for a pair-wise comparison Cosine Similarity does pretty good for most purposes. On Wed, Dec 3, 2014 at 10:45 AM, Paul Taylor wrote: > On 03/12/2014 15:14, Barry Coughlan wrote: >> >> Hi Paul, >> >> I don't have much expertise in this area so hopefully others will answer, >> but maybe this

Re: How best to compare tow sentences

2014-12-03 Thread Paul Taylor
On 03/12/2014 15:14, Barry Coughlan wrote: Hi Paul, I don't have much expertise in this area so hopefully others will answer, but maybe this is better than nothing. I don't know many out-of-the-box solutions for this problem, but I'm sure they exist. Mahout and Carrot2 might be worth investi

Re: How best to compare tow sentences

2014-12-03 Thread Barry Coughlan
Hi Paul, I don't have much expertise in this area so hopefully others will answer, but maybe this is better than nothing. I don't know many out-of-the-box solutions for this problem, but I'm sure they exist. Mahout and Carrot2 might be worth investigating. Similarity Metrics: - Jaccard Index. Me