Re: [OpenBabel-Devel] Canonicalization results for eMolecules (5 million compounds)

2010-10-14 Thread Craig A. James
On 10/14/10 6:43 AM, Tim Vandermeersch wrote: > On Wed, Oct 13, 2010 at 8:58 PM, Geoffrey Hutchison > wrote: >>> 3429680: [Li+251] 24639246 >>> 3429701: [ClH+276] 24639289 >>> 3429702: [n+251]1(C)c1 24639291 I'd be happy if these were just tossed out. They're obviously not valid, and a

Re: [OpenBabel-Devel] Canonicalization results for eMolecules (5 million compounds)

2010-10-14 Thread Geoffrey Hutchison
On Oct 14, 2010, at 9:43 AM, Tim Vandermeersch wrote: > Yes, but we should handle this by throwing an error. There are probably more > of these cases though. I'm adding that now. It will complain if the charge is above 10, or if the positive charge is greater than the number of electrons (e.g.

Re: [OpenBabel-Devel] Canonicalization results for eMolecules (5 million compounds)

2010-10-14 Thread Chris Morley
On 14/10/2010 14:43, Tim Vandermeersch wrote: > On Wed, Oct 13, 2010 at 8:58 PM, Geoffrey Hutchison > wrote: >>> 3429680: [Li+251] 24639246 >>> 3429701: [ClH+276] 24639289 >>> 3429702: [n+251]1(C)c1 24639291 >> >> These are an easy fix to the SMILES reader. Currently, we only take single

Re: [OpenBabel-Devel] Canonicalization results for eMolecules (5 million compounds)

2010-10-14 Thread Tim Vandermeersch
On Wed, Oct 13, 2010 at 8:58 PM, Geoffrey Hutchison wrote: >> 3429680: [Li+251] 24639246 >> 3429701: [ClH+276] 24639289 >> 3429702: [n+251]1(C)c1 24639291 > > These are an easy fix to the SMILES reader. Currently, we only take single > digits for charge. Now I think the molecule is prepos

Re: [OpenBabel-Devel] Canonicalization results for eMolecules (5 million compounds)

2010-10-13 Thread Geoffrey Hutchison
> 3429680: [Li+251] 24639246 > 3429701: [ClH+276] 24639289 > 3429702: [n+251]1(C)c1 24639291 These are an easy fix to the SMILES reader. Currently, we only take single digits for charge. Now I think the molecule is preposterous (e.g., lithium doesn't have 251 electrons to remove!) but I

[OpenBabel-Devel] Canonicalization results for eMolecules (5 million compounds)

2010-10-13 Thread Tim Vandermeersch
Hi, Here are the results from the shuffle (10x) test for the 5 million compounds in the eMolecules database. In general the results are good and only 33 canonicalization errors remain which should be easy to fix. Process stops: 3429680, 3429701, 3429702, 3429717, 3429742, 3429767, 3429887, ... (