Excuse the Off-Topic, but I'm looking for a Java API for determining
the degree of similarity (based on word frequency or whatever) between
two text strings.
I though of posting here since I know there are some people here expert
in semantic web technologies that could maybe help me.
Thanks
Ugo,
I think what you're looking for is the Levenshtein Distance Algorithm.
http://www.google.com/search?hl=enq=java+Levenshtein+implementationbtnG=Google+Search
HTH,
Tony
Ugo Cei wrote:
Excuse the Off-Topic, but I'm looking for a Java API for determining the
degree of similarity (based on
On 6/15/05, Ugo Cei [EMAIL PROTECTED] wrote:
Excuse the Off-Topic, but I'm looking for a Java API for determining
the degree of similarity (based on word frequency or whatever) between
two text strings.
I though of posting here since I know there are some people here expert
in semantic web
Excuse the Off-Topic, but I'm looking for a Java API for determining
the degree of similarity (based on word frequency or whatever) between
two text strings.
also commons codec has some algorithms
...depends on what you are after exactly
http://jakarta.apache.org/commons/codec/
cheers
--
Il giorno 15/giu/05, alle 16:32, Tony Collen ha scritto:
Ugo,
I think what you're looking for is the Levenshtein Distance Algorithm.
http://www.google.com/search?
hl=enq=java+Levenshtein+implementationbtnG=Google+Search
Nice! I also found an implementation nearby:
On 6/15/05, Ugo Cei [EMAIL PROTECTED] wrote:
Il giorno 15/giu/05, alle 16:32, Tony Collen ha scritto:
snip/
Actually, what I am trying to come up is an algorithm for determining
whether two texts refer (more or less) about similar subjects.
Eee, then you may have to jump into the NLP stuff
Peter Hunsberger wrote:
On 6/15/05, Ugo Cei [EMAIL PROTECTED] wrote:
Il giorno 15/giu/05, alle 16:32, Tony Collen ha scritto:
snip/
Actually, what I am trying to come up is an algorithm for determining
whether two texts refer (more or less) about similar subjects.
Eee, then you may
Il giorno 15/giu/05, alle 18:27, Stefano Mazzocchi ha scritto:
I've been working on this for the past few months. There is no clearcut
solution, but using LSI is probably the best approach for the above
LSI == ?
As for string distance, you might want to check out
secondstring.sf.net.
Ugo Cei wrote:
Il giorno 15/giu/05, alle 18:27, Stefano Mazzocchi ha scritto:
I've been working on this for the past few months. There is no clearcut
solution, but using LSI is probably the best approach for the above
LSI == ?
latent semantic indexing
As for string distance, you might