Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-21 Thread Philipp Thiel
Hi, you probably read about the Tanimoto being a proper metric in case of having binary data in Leach and Gillet 'Introduction to Chemoinformatics' chapter 5.3.1 in the revised edition. Best, Philipp Thiel > From: "David Cosgrove" > To: "Chris Earnshaw" > Cc:

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-21 Thread David Cosgrove
I used to have a paper that demonstrated that the tanimoto coefficient does, in fact, obey the triangle inequality. I fear I lost access to it when I retired but maybe a determined google expert could rediscover it. I expect James means what we used to call the cluster seed, i.e. the molecule the

Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-21 Thread Chris Earnshaw
Hi I'm afraid I can't help with an RDkit solution to your question, but there are a couple of issues which should be born in mind: 1) The centroid of a cluster is a vector mean of the fingerprints of all the members of the cluster and probably will not be represented *exactly* by any member of