[Rdkit-discuss] Difference between ECFP and MorganFingerprint

2015-09-29 Thread Jing Lu
Dear RDKit community, I was treating AllChem.GetMorganFingerprint(m1,2) the same as ECFP4. I am writing a paper for a open source tool, so I need to be very accurate. I have seen one open source implementation for ECFP, which is from CDK. Most researchers are using Pipeline Pilot to calculate ECFP

[Rdkit-discuss] atom size when plotting a molecule

2015-09-25 Thread Jing Lu
Dear RDKit users, I use Draw.MolsToFile for plotting 2D molecules. However, the size of atom is relative small. This is ok for nitrogen and oxigen. It's not very clear for atoms like F and Cl. Is there any better way to do it? Thanks, Jing ---

Re: [Rdkit-discuss] generating scaffold trees

2015-09-03 Thread Jing Lu
Just out of curiosity, I have seen many publications about scaffold tree generation. Like the Scaffold Tree[1], Scaffold Hunter[2], inSARa, Fragment-Augmented Molecular Hasse Diagrams, Snowflake Diagram[3]... How do you guys choose among them? I haven't seen any comparison paper for those methods,

Re: [Rdkit-discuss] Clustering 1M molecules

2015-08-28 Thread Jing Lu
>> Hi Jing, >> >> Most fingerprints are binary, thus can be stored as np.bool_, which >> compared to double should be 64 times more memory efficient. >> >> Best, >> Maciej >> >> >> Pozdrawiam, | Best regards, >> Maciek Wójciko

Re: [Rdkit-discuss] Clustering 1M molecules

2015-08-27 Thread Jing Lu
Hi Greg, Thanks! It works! But, is that possible to fold the fingerprint to smaller size? np.zeros((100,2048)) still takes a lot of memory... Best, Jing On Wed, Aug 26, 2015 at 11:02 PM, Greg Landrum wrote: > > On Thu, Aug 27, 2015 at 3:00 AM, Jing Lu wrote: > >> >

Re: [Rdkit-discuss] Clustering 1M molecules

2015-08-26 Thread Jing Lu
PM, Jing Lu wrote: > > I hope the memory issue won't be a problem. > > That's up to you and your choice of threshold. > > > Most AgglomerativeClustering algorithms have time complexity with N^2. > Will that be a problem? > > You have to decided for yourse

Re: [Rdkit-discuss] Clustering 1M molecules

2015-08-23 Thread Jing Lu
ayon/ > It's not function of RDKit, but I think the library can cluster molecules > using ECFP4. > > Unfortunately, input file format of bayon is not distance matrix but easy > to prepare the format. > > Best regards. > > Takayuki > > > 2015年8月23日(日) 12:03

Re: [Rdkit-discuss] Clustering 1M molecules

2015-08-23 Thread Jing Lu
be a problem? Best, Jing On Sun, Aug 23, 2015 at 3:13 AM, Andrew Dalke wrote: > On Aug 23, 2015, at 3:43 AM, Jing Lu wrote: > > If I want to cluster more than 1M molecules by ECFP4. How could I do it? > If I calculate the distance between every pair of molecules, the size of >

Re: [Rdkit-discuss] Clustering 1M molecules

2015-08-22 Thread Jing Lu
cluster it and then put the respective scaffold compounds inside the > cluster . > > Sent from my iPhone > > > On Aug 22, 2015, at 8:43 PM, Jing Lu wrote: > > > > Dear RDKit users, > > > > If I want to cluster more than 1M molecules by ECFP4. How could I do

[Rdkit-discuss] Clustering 1M molecules

2015-08-22 Thread Jing Lu
Dear RDKit users, If I want to cluster more than 1M molecules by ECFP4. How could I do it? If I calculate the distance between every pair of molecules, the size of distance matrix will be too big. Does RDKit support any heuristic clustering algorithm without calculating the distance matrix of the