Hi
There's a degree of confusion here depending on whether people are
considering *similarity* (tanimoto, cosine, whatever) or *distance*
(however that's defined). The two are very different from each other.
Cosine *similarity* obeys the triangle inequality; cosine *distance*
doesn't - pretty
On 21/09/2018 16:53, Chris Earnshaw wrote:
Hi
I'm afraid I can't help with an RDkit solution to your question, but
there are a couple of issues which should be born in mind:
1) The centroid of a cluster is a vector mean of the fingerprints of
all the members of the cluster and probably will not
On Sep 26, 2018, at 20:26, Peter S. Shenkin wrote:
> Ah, David, but how do you define a "real" singleton?
There can be many different definitions of what a '"real" singleton' might be,
but we are specifically talking about Butina clustering.
The Butina paper defines the term "false singleton",
Ah, David, but how do you define a "real" singleton?
-P.
On Wed, Sep 26, 2018 at 1:30 PM David Cosgrove
wrote:
> Slightly off topic, but a minor issue with the Taylor-Butina algorithm is
> that it generates “false singletons”. These are molecules just outside the
> clustering cutoff that are
Slightly off topic, but a minor issue with the Taylor-Butina algorithm is
that it generates “false singletons”. These are molecules just outside the
clustering cutoff that are stranded when their neighbours are put in a
different, larger cluster. We used to find it convenient to have a sweep of
Well, I'm not really familiar with the Taylor-Butina clustering method, so
I'm proposing a methodology based on generalizing something that I found to
be useful in a somewhat different clustering context.
Presuming that what you are clustering is the fingerprints of structures,
and that you know
On Sep 25, 2018, at 17:13, Peter S. Shenkin wrote:
> FWIW, in work on conformational clustering, I used the “most representative”
> molecule; that is, the real molecule closest to the mathematical centroid.
> This would probably be the best way of displaying a single molecule that
> typifies
(I see that I accidentally responded to Andrew, only, earlier; I'm copying
to the group this time.)
FWIW, in work on conformational clustering, I used the “most
representative” molecule; that is, the real molecule closest to the
mathematical centroid. This would probably be the best way of
On Sep 21, 2018, at 14:53, Philipp Thiel
wrote:
> you probably read about the Tanimoto being a proper metric in case of having
> binary data
> in Leach and Gillet 'Introduction to Chemoinformatics' chapter 5.3.1 in the
> revised edition.
What we call Tanimoto is more broadly known as the
uot;
> Cc: "Rdkit-discuss@lists.sourceforge.net"
> ,
> "James T. Metz"
> Sent: Friday, 21 September, 2018 13:45:18
> Subject: Re: [Rdkit-discuss] Butina clustering with additional output
> I used to have a paper that demonstrated that the tanimoto coefficient does,
I used to have a paper that demonstrated that the tanimoto coefficient
does, in fact, obey the triangle inequality. I fear I lost access to it
when I retired but maybe a determined google expert could rediscover it.
I expect James means what we used to call the cluster seed, i.e. the
molecule the
Hi
I'm afraid I can't help with an RDkit solution to your question, but there
are a couple of issues which should be born in mind:
1) The centroid of a cluster is a vector mean of the fingerprints of all
the members of the cluster and probably will not be represented *exactly*
by any member of
RDkit Discussion Group,
I note that RDkit can perform Butina clustering. Given an SDF ofsmall
molecules I would like to cluster the ligands, but obtain additionalinformation
from the clustering algorithm. In particular, I would like to obtainthe
cluster number and Tanimoto distance from
13 matches
Mail list logo