On 24/07/2014 17:06, Wallace Chan wrote: > Tim, > > Thanks for your reply. Yes, we have the canonical SMILES strings stored > as properties in our glass.sdf file. I tried to generate canonical SMILES > as the result, and they are different than ours. Thus, ours were > probably acquired using a different canonicalization.
It is possible to recover your canonical SMILES from glass.sdf and add it to the title of the results file: obabel glass.sdf -ifs -O results.smi -sc1ccccc1 --append SMILESSTRING where SMILESSTRING is the name of the sdf property. You could also construct a canonical SMILES file: obabel glass.sdf -ifs -O results.smi -otxt -sc1ccccc1 --title "" --append "SMILESSTRING" For each matching molecule, the output format txt gives just the title, which --title "" removes; the SMILES is then added. Other properties or descriptors could be added, e.g. --append "SMILESSTRING inchi" This then leads to > another question that has come to me. Does the input for substructure or > similarity searching have to be in SDF format or can it be another > format, such as a list of InChI ID's? In other words, does the fast > search index have to come from an SDF file? Many thanks. The datafile (and the output query results) can be in any format, including inchi. Chris > > On Tue, Jul 22, 2014 at 7:44 PM, Tim Vandermeersch > <tim.vandermeer...@gmail.com <mailto:tim.vandermeer...@gmail.com>> wrote: > > Hi, > > I assume you have canonical SMILES strings in glass.sdf stored as > titles or properties. Correct me if this is incorrect. If so, it > depends on what program was used to create these canonical SMILES > strings. If you used openbabel for this, you can convert the > molecules in result.smi to openbabel canonical SMILES (or write > canonical SMILES directly using the .can extension). > > In the case where another program was used to generate the canonical > SMILES, it would not be possible to use openbabel to generate the > same canonical SMILES starting from result.smi. If you have access > to the other program you could use this to convert results.smi to > these canonical SMILES and use these to search glass.sdf. > > The reason for this is that there is no universal SMILES > canonicalization algorithm. Different toolkits will result in > different canonical SMILES (which are canonical only when using the > same toolkit). InChI on the hand has a single reference implementation. > > Tim > > > On Wed, Jul 23, 2014 at 12:03 AM, Wallace Chan <walla...@umich.edu > <mailto:walla...@umich.edu>> wrote: > > Dr. Hutchison, > > Yes, this helps. I do have another question about substructure > searching. We are building a database with roughly 270,000 > molecules and want users to be able to do a substructure and > similarity search. I've read the following documentation, > http://openbabel.org/docs/dev/Fingerprints/fingerprints.html, > and it helps in understand how this process works. However, I > want to ask whether or not the output file from the query can > contain the exact same SMILES strings that were generated from > the fast search index. Currently, the SMILES strings generated > from the query in the result.smi file are not the canonical > SMILES that I used to create the fast search index. For example, > if I were to look for a benzene substructure with the following > command, > > *babel glass.fs -ifs -sc1ccccc1 result.smi* > > would I be able to retrieve the SMILES string from glass.sdf, > which was used to create glass.fs? Many thanks for your patience. > ------------------------------------------------------------------------------ Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss