On 05/12/2014 01:25, Chris Hostetter wrote:
: For a number of years I've been doing this for some time by creating a
: RAMDirectory, creating a document for one of the sentence and then doing a
: search using the other sentence and seeing if we get a good match. This has
: worked reasonably well
: For a number of years I've been doing this for some time by creating a
: RAMDirectory, creating a document for one of the sentence and then doing a
: search using the other sentence and seeing if we get a good match. This has
: worked reasonably well but since improving the performance of other
, December 03, 2014 11:49 AM
To: java-user@lucene.apache.org; paul_t...@fastmail.fm
Subject: Re: How best to compare tow sentences
There are various implementations of Damerau-Levenshtein online. I don't know
how much it will improve your results however.
Why are you not indexing all o
There are various implementations of Damerau-Levenshtein online. I don't
know how much it will improve your results however.
Why are you not indexing all of the strings? If you don't have to compute
all possible pairs, then you are better off without Lucene.
Note that the cosine similarity calcul
Hi,
If you are comparing two song titles which are usually very short you are
better of using custom set of several features rather than using one of
cosine or levenstein or jaccard. You may use the combination of the
following:
1. cosine sim score
2. Jaccard overlap coeff
3. how many words in th
Paul, for a pair-wise comparison Cosine Similarity does pretty good
for most purposes.
On Wed, Dec 3, 2014 at 10:45 AM, Paul Taylor wrote:
> On 03/12/2014 15:14, Barry Coughlan wrote:
>>
>> Hi Paul,
>>
>> I don't have much expertise in this area so hopefully others will answer,
>> but maybe this
On 03/12/2014 15:14, Barry Coughlan wrote:
Hi Paul,
I don't have much expertise in this area so hopefully others will
answer, but maybe this is better than nothing.
I don't know many out-of-the-box solutions for this problem, but I'm
sure they exist. Mahout and Carrot2 might be worth investi
Hi Paul,
I don't have much expertise in this area so hopefully others will answer,
but maybe this is better than nothing.
I don't know many out-of-the-box solutions for this problem, but I'm sure
they exist. Mahout and Carrot2 might be worth investigating.
Similarity Metrics:
- Jaccard Index. Me