Would it be okay to do that as part of a separate pull request? i.e.
if there are no other concerns, could you merge it as is. The easiest
way to do the pruning is to comment out relevant code one at a time,
and seeing whether the results change (for the worse). This will take
some time, but it
> Personally, I'd also like to remove any "patterns" that aren't
> triggered by (aromatic atoms in) molecules in any of these databases,
> on the basis that it's better to have a set of patterns that we know
> are correct (and all covered by test cases)
I’d be fine with some pruning. I’ve been
Ok - am getting somewhere now. I've confirmed that there is a problem
with the current codebase and the use of 'D', e.g. for protonated
imidazole (as in histidine in vivo), two different answers are found
depending on whether hydrogens are explicit or not:
C:\Users\noel>obabel -:Cc1[nH]c[nH+]c1
Maybe am overthinking. If it doesn't change the final output (as
regards aromatic SMILES) on ChEMBL, maybe it's not worth worrying
about now.
- Noel
On 30 January 2017 at 18:31, Noel O'Boyle wrote:
> Great. One question I've run into is what was the intention of the D2
>
Great. One question I've run into is what was the intention of the D2
etc in the SMARTS patterns. Was it the number of heavy atom neighbors?
As written, it's the number of explicit nbrs in the graph, which is
complicated by the fact that OB's SMILES parser currently adds an
explicit H for H's
I think it's a great idea. Chris Morley had recommended similar concepts in
terms of implicit valence.
Yes, many of the stranger SMARTS patterns here are for "dodgy" SMILES that
should retain aromaticity. It's possible, perhaps to set some level of "if
it was initially flagged as an aromatic