I didn't want to file a suggestion for a javadoc patch without hearing from someone who knows a bit more about the math history behind it because I didn't want to suggest something that may be in error. When I checked the Wikipedia article on it, the article noted that there was confusion an inconsistency between papers as to what Tanimoto actually was and how it compared to Jaccard. So, I went to the primary source for Jaccard and am getting the primary source for Tanimoto when/if interlibrary loan comes through.
On Mon, Apr 8, 2013 at 12:04 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > I don't see the problem here. We only want to compare two items so > Jaccard and Tanimoto are identical. > > Could you file a JIRA and suggest a javadoc patch? > > Why did this take you to an ancient journal instead of Wikipedia? > > > On Apr 7, 2013, at 6:54 AM, James Endicott wrote: > > > As far as I can tell, the difference between the two is that the Jaccard > > similarity can only be used to compare two items using the formula: > > items appearing in both documents/(items just appearing in one + items > just > > appearing in the other + items appearing in both) > > But the Tanimoto similarity measure allows for comparing between any > number > > of items by generalizing the formula to: > > items appearing in all documents/(items just appearing in one + items > just > > appearing in another + ... + items appearing in some but not all + ... + > > items appearing in all) > > > > I think the class could be generalized to implement the full Tanimoto > > similarity without too much difficulty (though I don't think it's a high > > priority) but at the moment it does not do so. While I realize this is > > probably a trivial matter, I hope the docs get updated at some point so > > another grad student doesn't have to muddle through a botany article in a > > Swiss journal from 1901 again. > >