date:20161205

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-05 Thread Ling Chan

Thank you for sharing your results, Alexis. This is indeed an interesting problem. Just wonder what are the 339 FP's? Are they all English words with fewer than 6 characters? If RDKit can construct a molecule out of them, I suppose in theory they could be valid smiles? Looks like the problem with

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-05 Thread Andrew Dalke

On Dec 5, 2016, at 3:28 PM, Alexis Parenty wrote: > For the parenthesis issue, the difficulty is to differentiate the SMILES > formats (xxx)(xxx) from this one (xxx)… I will try and address > that using something like: I do not understand. The first one is not a SMILES format. Can y

Re: [Rdkit-discuss] File Conversion?

2016-12-05 Thread Rocco Moretti

It's not something that RDKit can do - RDKit is focused more on small organic molecules, rather than biomacromolecules. For DNA, if all you want is an idealized B-form double helix, there's a number of programs out there which can take in a sequence and make an ideal (or almost-ideal) structure fr

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-05 Thread David Cosgrove

Hi Alexis, While you're wrestling with the difference between () and CC(C)C you could also consider that . in a SMILES is valid, and denotes a mixture, for example CCO.O.O (for vodka, maybe). You might get those in FDA documents that discuss formulations, for example. In a well scanned and p

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-05 Thread Alexis Parenty

Oups! Thanks Brian and Igor! I did not understand at first the punctuation issues referred yesterday by Andrew with smiles that could be quoted inside parenthesis or at the end of a sentence next to a full stop or a semi-col. I see it now. I should remove the punctuation filter. For the parenthes

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-05 Thread Brian Kelley

Cool! Btw- try sanitize=False Also, Andrew is right that you will miss parenthetical phrases. I.e. Benzene(c1c1) and the like, just reasserting that this is a hard problem! Brian Kelley > On Dec 5, 2016, at 5:35 AM, Alexis Parenty > wrote: > > Dear All, > Many thanks to everyon

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-05 Thread Andrew Dalke

On Dec 5, 2016, at 11:35 AM, Alexis Parenty wrote: > I have tested my script on: > • 7900 unique SMILES for “drug-like molecules” > • Alice’s adventure in wonderland (I never read the book but I assumed > there is no SMILES!) > • A shuffled mixture of Alice’s in wonderland and 7900 uni

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-05 Thread Igor Filippov

Alexis, Nice, but it doesn't seem to take into account Andrew Dalke's comment that valid SMILES may be adjacent to a punctuation sign (e.g. period or parenthesis). Perhaps it is not an issue for your specific project, but maybe instead of simple "split()" it is worthwhile to use something more sop

Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-05 Thread Alexis Parenty

Dear All, Many thanks to everyone for your participation in that discussion. It was very interesting and useful. I have written a small script that took on board everyone’s input: This incorporates a few "text filters" before the RDKit function: First of all I made a dictionary of all the words p

Re: [Rdkit-discuss] Extracting SMILES from text

Re: [Rdkit-discuss] Extracting SMILES from text

Re: [Rdkit-discuss] File Conversion?

Re: [Rdkit-discuss] Extracting SMILES from text

Re: [Rdkit-discuss] Extracting SMILES from text

Re: [Rdkit-discuss] Extracting SMILES from text

Re: [Rdkit-discuss] Extracting SMILES from text

Re: [Rdkit-discuss] Extracting SMILES from text

Re: [Rdkit-discuss] Extracting SMILES from text

9 matches

Site Navigation

Mail list logo

Footer information