Hi,

I have a set of say 1000 generated fingerprints each of length 39972; across 
all 1000 fingerprints many bits are the same – they contain no information 
about the differences between the 1000 molecules.

e.g. for list

010100001
010110100
010101110
010100010

The first four bits are redundant, I could just record them as:

00001
10100
01110
00010

In reality, the redundant bits are distributed through the bit string, so I 
need a method to determine which bits are redundant, and then remove them from 
each fingerprint.

Cheers,
Steve.



From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 04 June 2014 04:40
To: Stephen O'hagan
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] remove redundant bits from bitvector fingerprints

Hi Steve,

On Tue, Jun 3, 2014 at 2:08 PM, Stephen O'hagan 
<soha...@manchester.ac.uk<mailto:soha...@manchester.ac.uk>> wrote:
I have a fragment of code generating fingerprints for a long  list of molecules 
(length ~ 1000)

for index in range(0,len(smi)):
                smiles=smi[index]
mol=Chem.MolFromSmiles(smiles)
AllChem.EmbedMolecule(mol)
AllChem.UFFOptimizeMolecule(mol)
dm = Chem.Get3DDistanceMatrix(mol)
fp = Generate.Gen2DFingerprint(mol,factory, dMat=dm)
fp = fp.ToBitString()
bs[index]=fp

The length of  each bitvectors generated is 39972, and the list has a lot of 
redundant ‘1’s and ‘0’s.

Is there an easy method to filter out these redundant bits?

What do you mean by redundant bits?

The length of the bit vectors is determined by the parameters you provide for 
building the pharmacophore fingerprints (number of points, number of features, 
and number of distance bins). The length of the strings that you get from 
fp.ToBitString() should be equal to this number of bits.

-greg

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their 
applications. Written by three acclaimed leaders in the field, 
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to