Hi,
I am trying to use rdkit to replace matched SMARTS patterns in a molecule with 
a wildcard (*), and return a SMARTS string where the original molecule is an 
instance of this returned SMARTS string,
I tried the following:########from rdkit import Chem
def generate_modified_smarts(smiles, smarts_patterns, num_patterns_to_replace): 
   molecule = Chem.MolFromSmiles(smiles)    patterns_replaced = 0
    for smarts in smarts_patterns:        if patterns_replaced >= 
num_patterns_to_replace:            break
        pattern = Chem.MolFromSmarts(smarts)        while 
molecule.HasSubstructMatch(pattern) and patterns_replaced < 
num_patterns_to_replace:            match_indices = 
molecule.GetSubstructMatch(pattern)
            # Extract segments before and after the match            
before_match, after_match = "", ""            if match_indices[0] > 0:          
      before_match = Chem.MolFragmentToSmarts(molecule, 
atomsToUse=list(range(match_indices[0])))            if match_indices[-1] < 
molecule.GetNumAtoms() - 1:                after_match = 
Chem.MolFragmentToSmarts(molecule, atomsToUse=list(range(match_indices[-1] + 1, 
molecule.GetNumAtoms())))
            # Combine parts with a wildcard            modified_smarts = 
before_match + '*' + after_match            molecule = 
Chem.MolFromSmarts(modified_smarts)            patterns_replaced += 1
    return Chem.MolToSmarts(molecule)

example_smiles = 
"CCOC1=C(C=C2C(=C1)N=CC(=C2NC3=CC(=C(C=C3)OCC4=CC=CC=N4)Cl)C#N)NC(=O)C=CCN(C)C"smarts_patterns
 = ["C=O", "C#N"]num_patterns_to_replace = 2
modified_smarts = generate_modified_smarts(example_smiles, smarts_patterns, 
num_patterns_to_replace)print(f"Modified molecule SMARTS pattern: 
{modified_smarts}")#######
While it seems to work for C=O, it does not for C#N and the connectivity is 
messed up for C#N, even if I use it alone, i.e. without the carbonyl. The 
matched patterns could be anywhere in the molecule and could be more complex 
than this, but I just tried some simple cases to see how robust is this 
approach. It worked for "CCO", but did not work when i tried "Cl".
I am wondering if this is something you can help with,

Marawan
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to