Wow - that was quick. Thanks again Greg - it's much appreciated. If I ever get 
around to publishing the algorithm, I'll make sure I open source and contribute 
it to RDKit.

Thanks
Jameed

-----Original Message-----
From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 12 March 2013 04:48
To: Jameed Hussain
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] RDKit fingerprint enhancement request

Dear Jameed,

On Mon, Mar 11, 2013 at 7:10 PM, Jameed Hussain <jameed.x.huss...@gsk.com> 
wrote:
>
<snip>
>
> I remember chatting to you at the UGM about this. It works okay - but
> it is slow (as you need to generate an fp for every atom you need the
> partial fp
> for) and can suffer from issues related to symmetry. Hence, I was
> wondering if you could add an option/enhancement to the topological
> fingerprinting code.
>
>
>
> Would it be possible to record the bits set for every atom in a given
> molecule as you generate the fingerprint. So something like a
> dictionary keyed on atom id with a value containing an array/set of
> the bits that get set for the atom. So as you hash a path, record the
> bits that are set to on for the ids of the atoms in the path.
> Hopefully, this isn't a large piece of work.

It's not.

>
> It would make the partial_fp generation much quicker as I would just
> need to generate the fp once and the data structure would contain all
> the information needed to generate the partial fp for any
> atom/substructure in the molecule (without the symmetry issues). It
> would also have the benefit of providing a data structure to explain
> the bits for the topological fingerprint like you have for the Morgan
> fingerprint. I hope that is enough to convince you J.

You had me already... this is just a nice extra bit. :-)

> Lastly, there isn't an urgency as I have a slow implementation - I
> just want to make it quicker.

I just checked in an initial implementation. This will slow the fingerprinter 
down somewhat when you're using the option, but it shouldn't be that bad 
compared to the general slowness of the fingerprinter.

>From the Python side it looks like this:

In [1]: from rdkit import Chem

In [2]: l = []

In [3]: 
fp=Chem.RDKFingerprint(Chem.MolFromSmiles('CCCO'),minPath=1,maxPath=3,nBitsPerHash=1,atomBits=l)

In [4]: list(fp.GetOnBits())
Out[4]: [242, 591, 718, 820, 1485]

In [6]: l
Out[6]:
[[718, 820, 1485],
 [718, 820, 591, 1485],
 [718, 242, 820, 591, 1485],
 [242, 591, 1485]]


-greg


________________________________

This e-mail was sent by GlaxoSmithKline Services Unlimited
(registered in England and Wales No. 1047315), which is a
member of the GlaxoSmithKline group of companies. The
registered address of GlaxoSmithKline Services Unlimited
is 980 Great West Road, Brentford, Middlesex TW8 9GS.


------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to