Thank you for sharing your results, Alexis. This is indeed an interesting
problem.
Just wonder what are the 339 FP's? Are they all English words with fewer
than 6 characters? If RDKit can construct a molecule out of them, I suppose
in theory they could be valid smiles?
Looks like the problem with
On Dec 5, 2016, at 3:28 PM, Alexis Parenty wrote:
> For the parenthesis issue, the difficulty is to differentiate the SMILES
> formats (xxx)(xxx) from this one (xxx)… I will try and address
> that using something like:
I do not understand. The first one is not a SMILES format.
Can y
It's not something that RDKit can do - RDKit is focused more on small
organic molecules, rather than biomacromolecules.
For DNA, if all you want is an idealized B-form double helix, there's a
number of programs out there which can take in a sequence and make an ideal
(or almost-ideal) structure fr
Hi Alexis,
While you're wrestling with the difference between () and CC(C)C you
could also consider that . in a SMILES is valid, and denotes a mixture, for
example CCO.O.O (for vodka, maybe). You might get those in FDA documents
that discuss formulations, for example. In a well scanned and p
Oups! Thanks Brian and Igor! I did not understand at first the punctuation
issues referred yesterday by Andrew with smiles that could be quoted inside
parenthesis or at the end of a sentence next to a full stop or a semi-col.
I see it now. I should remove the punctuation filter.
For the parenthes
Cool! Btw- try sanitize=False
Also, Andrew is right that you will miss parenthetical phrases. I.e.
Benzene(c1c1) and the like, just reasserting that this is a hard problem!
Brian Kelley
> On Dec 5, 2016, at 5:35 AM, Alexis Parenty
> wrote:
>
> Dear All,
> Many thanks to everyon
On Dec 5, 2016, at 11:35 AM, Alexis Parenty wrote:
> I have tested my script on:
> • 7900 unique SMILES for “drug-like molecules”
> • Alice’s adventure in wonderland (I never read the book but I assumed
> there is no SMILES!)
> • A shuffled mixture of Alice’s in wonderland and 7900 uni
Alexis,
Nice, but it doesn't seem to take into account Andrew Dalke's comment that
valid SMILES may be adjacent to a punctuation sign (e.g. period or
parenthesis).
Perhaps it is not an issue for your specific project, but maybe instead of
simple "split()" it is worthwhile to use something more sop
Dear All,
Many thanks to everyone for your participation in that discussion. It
was very interesting and useful. I have written a small script that
took on board everyone’s input:
This incorporates a few "text filters" before the RDKit function:
First of all I made a dictionary of all the words p
9 matches
Mail list logo