Dear Kovas,
you should be able to achieve what you need applying the following patch
and rebuilding the RDKit:
--- Code/GraphMol/Atom.cpp 2018-08-23 19:33:34.669598140 +0100
+++ Code/GraphMol/Atom.cpp 2018-08-24 19:02:18.308912142 +0100
@@ -432,7 +432,8 @@
bool Atom::Match(Atom const *what) const {
PRECONDITION(what, "bad query atom");
- bool res = getAtomicNum() == what->getAtomicNum();
+ bool res = getAtomicNum() == what->getAtomicNum()
+ || ((!getAtomicNum() && hasQuery()) || (!what->getAtomicNum() &&
what->hasQuery()));
// special dummy--dummy match case:
// [*] matches [*],[1*],[2*],etc.
This change does not break any existing unit tests, and does what you need:
from rdkit import Chem
from rdkit.Chem import rdFMCS
from rdkit.Chem.Draw import IPythonConsole
m1 = Chem.MolFromSmiles('[*:1][CH2:2][C:3]([CH3:4])=[CH2:5]')
m2 = Chem.MolFromSmiles('[F:11][CH2:12][C:13]([*:14])=[CH2:15]')
m1
m2
qp = Chem.AdjustQueryParameters()
qp.makeDummiesQueries = True
m1 = Chem.AdjustQueryProperties(m1, qp)
m2 = Chem.AdjustQueryProperties(m2, qp)
m1.GetSubstructMatches(m2)
((0, 1, 2, 3, 4),)
m2.GetSubstructMatches(m1)
((0, 1, 2, 3, 4),)
HTH, cheers
p.
On 08/23/18 18:20, Kovas Palunas wrote:
Thanks for the feedback and code example!
I understand that it works to make a third query mol using MCS that
matches both the original mols to then match with. However, this
seems like overkill (overly expensive) for this particular problem –
as I understand it MCS can be very expensive depending on the
compounds you are comparing. Would it not work to simply override the
atom.Match function with one that will always match dummies no matter
what the other atom is? I am not planning to compare SMARTSy queries
with my matching with any complexity beyond simply dummy atoms. In
fact, as I understand it, my example compounds are not made up of any
query atoms when they are read into rdkit – the dummies are just made
into queries after the read by the QueryParameters code. I am
definitely not interested in doing generic query to query matching.
- Kovas
*From: *Christos Kannas <chriskan...@gmail.com>
*Date: *Thursday, August 23, 2018 at 7:53 AM
*To: *Kovas Palunas <kovas.palu...@arzeda.com>
*Cc: *RDKit <rdkit-discuss@lists.sourceforge.net>, Paolo Tosco
<paolo.tosco.m...@gmail.com>
*Subject: *Re: [Rdkit-discuss] Matching Generalized Compounds
Hi Kovas,
You have two fuzzy compounds that you try to match them, because our
intuition says that any atom notation [*:1] from m1 should match the
Fluorine [F:11] in m2 and any atom [*:14] in m2 should match Carbon
[CH3:4] in m1.
The issue here is that you create two query compounds from m1 and m2
which will match their own specific substructures. Query to query
matching is not trivial.
In order to do what you want you need a query compound that combines
their characteristic, which is what Paolo showed.
Paolo with MCS and modifying atom properties created that query
compound '[*:1]-[CH2:2]-[C:3](-[*:4])=[CH2:5]' or
'[*:1]-[CH2X4:2]-[CX3:3](-[*:4])=[CH2X3:5]'
Also bare in mind that Paolo's approach changed the starting
compounds, as now they resemble the generic query compound that
combines their fuzzy atoms.
https://gist.github.com/CKannas/ac1a4791dec909552d7c8899cfaff030
Best,
Christos
Christos Kannas
Chem[o]informatics Researcher & Software Developer
Image removed by sender. View Christos Kannas's profile on LinkedIn
<http://cy.linkedin.com/in/christoskannas>
On Thu, 23 Aug 2018 at 12:36, Paolo Tosco <paolo.tosco.m...@gmail.com
<mailto:paolo.tosco.m...@gmail.com>> wrote:
Dear Kovas,
It looks like GetSubstructMatch() only finds a match if the dummy
atom is in the query, not if it is in the molecule they you are
matching the query against.
This notebook present a possible solution off the top of my head:
https://gist.github.com/ptosco/a35ac28a14103b47096f6d6af1aec831
which does not involve changes to the C++ layer, even though it is
computationally more expensive and will fail with disconnected
fragments as it uses FindMCS(). There may be better solutions -
this is what I came out with yesterday night in the little time I
had available.
Cheers,
P.
On 08/22/18 19:34, Kovas Palunas wrote:
Hi All,
I’m interested in having GetSubstructMatches return non-“null”
results in the following example. The results should lead to
a match where atom 1 maps to atom 11, 2 to 12, etc.
m1 = Chem.MolFromSmiles('[*:1][CH2:2][C:3]([CH3:4])=[CH2:5]')
m2 = Chem.MolFromSmiles('[F:11][CH2:12][C:13]([*:14])=[CH2:15]')
### do something here so that the mols will match ###
qp = Chem.AdjustQueryParameters()
qp.makeDummiesQueries = True
m1 = Chem.AdjustQueryProperties(m1, qp)
m2 = Chem.AdjustQueryProperties(m2, qp)
# I’d like both of the following to return results
m1.GetSubstructMatches(m2)
m2.GetSubstructMatches(m1)
My understanding of why these mols currently do not match is
as follows: because only the dummy atoms are made queries
(based on my query parameter adjustment), when one mol is
matched to another dummy 1 may match to F:11, but dummy 14
will then not match to methyl:14. This is because (as I
understand), normal atoms can only be matched by queries, and
cannot match them themselves.
Potential ideas to make this work as I’d like:
1. Override atom.Match in the python code – not sure that
this would work since the C++ version of this function is
what would be called during GetSubstructMatches
2. Override atom.Match in the C++ code – not quite sure how
to do this, or what side affects it might have. Ideally
the changes I make would only affect this example (and
other similar ones)
3. Make all atoms in both molecules QueryAtoms, but otherwise
leave them unchanged. I’m not quite sure how to do this!
Does anyone have any ideas for what the best approach here
would be, or knows if there is already built in functionality
for something like this? I’d prefer to not use SMARTS to
construct my molecules if possible, since I don’t really think
of them as queries, just as other molecules in the system that
happen to not be fully specified.
- Kovas
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org!http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
<http://sdm.link/slashdot>_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss