The OpenBabel toolkit has the ability to generate canonical SMILES of
fragments. It's a key part of the SMILES generator. And it keeps the
context of each atom and writes partial aromaticity. For example, if you
have Oc1c1O, and you specify atoms 1,2, 7, and 8, it will write "OccO".
OBMol
Two points to note:
1. If you break a bond, you should increase the implicit H count of the
attached atoms by the bond order. Otherwise you end up with radicals, as
you've seen.
2. If you copy the substructure instead of fragmenting, then the process
may be simpler as there is an option to adjust
Hi Noel,
I updated the fragmentation code.
My code is available at https://github.com/n-yoshikawa/contributed/blob/
master/c%2B%2B/fragments/obfragment.cpp
I enumerated ten most frequent fragments from our data by using this code.
The result was as follows:
SMILES percent
[C]1=CC=[C]C=C1