[Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-10 Thread David García Aristegui
Hello, i'm reading about substructure and similarity searches... http://openbabel.org/docs/dev/Fingerprints/fingerprints.html "On larger datasets it is necessary to first build a fastsearch index. This is a new file that stores a database of fingerprints for the files indexed. You will still need

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-10 Thread Craig A. James
On 11/10/11 7:44 AM, David García Aristegui wrote: > Hello, i'm reading about substructure and similarity searches... > http://openbabel.org/docs/dev/Fingerprints/fingerprints.html > > "On larger datasets it is necessary to first build a fastsearch index. > This is a new file that stores a database

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-11 Thread David García Aristegui
But if i need to design a chemical structures database... the way to do it is just to store the SMILES or structure "id" in a table, and for the searches just work with the .fs binary file? Is a good option to store in a table the id, SMILES, fp2 fingerprint and .fs for each structure, regarding t

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-11 Thread Ernst-Georg Schmid
Hello, >Does anyone know what is the best way to store the fastsearch index (to >reuse it) in a chemical structures database? best field type to store it? >(i'm using MySQL). you could: - store it outside the database in a file with pointers to the records in the database as Craig A. James sugg

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-11 Thread David García Aristegui
"store it alongside the database records as mychem does it" mmm, where? for me is unclear the MyChem database schema (obserialized field). A good example of chemical structures database is MolDB, by the way!!! http://merian.pch.univie.ac.at/~nhaider/cheminf/moldb5.html Best regards. > Hello,

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-11 Thread Nina Jeliazkova
On 11 November 2011 11:36, David García Aristegui wrote: > "store it alongside the database records as mychem does it" mmm, > where? for me is unclear the MyChem database schema (obserialized field). > > A good example of chemical structures database is MolDB, by the way!!! > http://merian.pch

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-11 Thread Ernst-Georg Schmid
Hello, >"store it alongside the database records as mychem does it" mmm, >where? for me is unclear the MyChem database schema (obserialized field). 'fp2' is the fp2 fingerprint byte[] as a MySQL BLOB. 'obserialized' is the serialized binary representation of an OBMol for performance reasons

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-11 Thread David García Aristegui
Thank you very much for the info!!! > Hello, > >>"store it alongside the database records as mychem does it" mmm, >>where? for me is unclear the MyChem database schema (obserialized field). > > 'fp2' is the fp2 fingerprint byte[] as a MySQL BLOB. > 'obserialized' is the serialized binary repre

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-11 Thread David García Aristegui
Nina, thank you very much for the links and info!!! > On 11 November 2011 11:36, David García Aristegui > wrote: > >> "store it alongside the database records as mychem does it" mmm, >> where? for me is unclear the MyChem database schema (obserialized >> field). >> >> A good example of chemica

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-11 Thread Steffen Neumann
Hi, I can just say that the pgchem::tigress performs pretty well: http://theplateisbad.blogspot.com/2010/11/pgchemtigress-sets-new-world-record.html We had a rough time initially, because SEGV in OpenBabel crashed the whole database during the import, but now we have the whole PubChem in there.

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-12 Thread Jérôme Pansanel
Hi, On ven., 2011-11-11 at 10:36 +0100, David García Aristegui wrote: > "store it alongside the database records as mychem does it" mmm, > where? for me is unclear the MyChem database schema (obserialized field). The Mychem project proposes a simple database schema for compound management. Th

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-14 Thread David García Aristegui
I'm more comfortable using MySQL, so i prefer to use projects with this RDBMS. In MyMolDB "(...)the Open Babel binary fingerprints and fingerprint bits of the molecules were precalculated and divided into 32 segments, these segments were converted to decimal numbers and stored in a mol_fp table, o

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-14 Thread Ernst-Georg Schmid
Hello, I'd say that the reason for choosing this storage method was a technical decision. Since an unfolded FP2 is 1024 bits long (1021 actually used) it doesn't fit into the largest integer datatype of MySQL, UNSIGNED BIGINT which is 2^64. So you either have to store it in a BLOB, but then you

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-14 Thread David García Aristegui
Very very interesting. Thank you very much indeed for all the information. Best regards. > Hello, > > I'd say that the reason for choosing this storage method was a technical > decision. Since an unfolded FP2 is 1024 bits long (1021 actually used) it > doesn't fit into the largest integer datatype

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-15 Thread Jérôme Pansanel
Hi, > Hello, > > I'd say that the reason for choosing this storage method was a > technical decision. Since an unfolded FP2 is 1024 bits long (1021 > actually used) it doesn't fit into the largest integer datatype of > MySQL, UNSIGNED BIGINT which is 2^64. So you either have to store it > in a

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-15 Thread Andrew Dalke
On Nov 14, 2011, at 1:47 PM, Ernst-Georg Schmid wrote: >> Since an unfolded FP2 is 1024 bits long (1021 >> actually used) it doesn't fit into the largest integer datatype of >> MySQL, UNSIGNED BIGINT which is 2^64. So you either have to store it >> in a BLOB, but then you have to deal with BLOB i

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-16 Thread Ernst-Georg Schmid
Hello, >InChI-Key are exactly done for exact searching. exactly, unlike different SMILES canonicalization implementations the InChI algorithm is standardized. And despite the fact that the first InChI-Key collisions have been found, at least there is an estimate of their collision probability.

Re: [Open Babel] Fastsearch format (fs) stored in a chemical structure database

2011-11-16 Thread Craig A. James
On 11/16/11 3:55 AM, Ernst-Georg Schmid wrote: >> InChI-Key are exactly done for exact searching. > > exactly, unlike different SMILES canonicalization implementations > the InChI algorithm is standardized. In my opinion, this is a common but incorrect criticism of SMILES. Variations in canoni