Keith,

Thank you for the thoughtful and informative response. It is very helpful to
have a better background of how the underlying technology works. I gather
that I probably don't need the AUTOINCREMENT keyword. Since I am inserting
the value of "id" from Structure in to other tables (as Structure_id) for
the purposes of keeping track of data from the same record spread across
multiple tables, it would seem as if using an explicitly defined id primary
key (instead of the implicit rowid) makes sense for the Structure table. I
don't think that the FOREIGN KEY reference to Structure(id) in the satellite
tables provides an enormous amount of protection, but there are some
situations where I can see it as being helpful and it looks like that could
not be done with rowid.

> or have caught a nasty case of the object-oriented disease...
It's funny that you should say that. I had a conversation with a friend the
other day who is a database programmer, and he advised having an explicit
primary key in each of my satellite tables that is independent of the
Structure_id column. I as planning on just using Structure_id as the primary
key, but he advised me to keep an different AUTOINCREMENT key because it may
be useful to access rows independent of the Structure_id value. This
programmer frequently uses an OO database system, so I thought your comment
was interesting. What is it about OO that would lead to always explicitly
defining the id?


Igor,
It seem as if you are saying that string values like a name should be passed
in SQL commands as variables, is that right? At the moment, I am accessing
SQLite through the ruby-sqlite3 interface, so the inserts look like,

@db.execute "INSERT INTO #{table_name.capitalize}
VALUES(#{insert_into_fields(table_name, values_array)})"

(I haven't got to queries yet)

the data is contained in the values_array, but I guessing that will still
create a problem if there are single quotes in one of the values? I have
looked at the link you sent (and to the very funny cartoon, thanks), but it
will take a bit to decipher. Since I am using the ruby interface and not the
C interface, it may be that I need to look at the ruby-sqlite3 doc, but I am
amusing/hoping that there is something similar. It looks like you set the
value of a parameter to your string value and then pass the parameter
instead of the literal string. Is that more or less the idea? I guess it
would be easy enough to double the quotes in the ruby code before inserts or
queries, as Simon suggested, and then un-double them on the way out if data
was being retrieved from the database to be written somewhere else. The
single quotes in the names have specific chemical meaning.


On 13 Nov 2012, at 5:41pm, Simon Slavin-3 wrote: 
> By the way, can I ask what iH2, iH3, iH4 are ?  I think I've figured out
> iH1.

Yes iH1 is an intergerized form of the molecular weight rounded to two
decimal places (float*100 with a round off correction on the resulting int).
Each of the other three is the sum of a class of molecular structure
descriptors.

The iH2 sums a group of descriptors based on an atom level quantification of
the valence electron density at each atom (similar to partial charge). These
atom level indices have been grouped in various ways, such as by functional
group, atom-type, and bond type. There are ~1000 of these different indices
that are summed and intergerized.  This sum varies a great deal with the
size and structure features of the molecule. There will, of course, be
instances where two different compounds have the same value for the iH2 sum,
as is true with the iH1 molecular weight key.

The iH3 sum is also a sum of structure indices, but these indices relate to
the connections of the molecular graph, such as the number and size of
rings, fused rings, branch points, paths of various lengths, and the
location of heteroatoms in these subgraphs. The iH3 indices are relatively
independent of the iH2 and serve to discriminate between compounds that may
have identical iH2 sums.

The iH4 key is a sum of indices that describe the overall shape and
connection complexity of the molecule (elongated/globular, highly
interconnected, etc.).

As far as I have been able to determine, this set of 4 ints (including the
mw int) is unique for any compound. Since names are problematic for
identifying a compound (there can be dozens of legal names and they have
characters that are difficult to deal with, especially where multiple
software applications are used in a tool chain), I am looking into how this
set of ints can be used to register a compound into a database instead of
some other identifier. These ints are produced by software that processed
chemical structure files (molecular structures stored in text files). I
receive such files from vendors, and not only can the compound have a
variety of names, it often can be drawn in a variety of forms, so the
process of determining if you already have a record for this compound or not
is non-trivial. Using these int keys will allow me to just process the
structure file with the software the generates the key values and try an
insert. If a record for the compound exists, I will know immediately, even
if the compound was entered under a different name, or the structure was
drawn differently.

I would be happy to explain more about this if you are interested, but
hopefully the above is enough to give a general idea.

Tomorrow I will post a more complete version of the database definitions and
insert syntax.

*LMHmedchem*



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/inserting-record-data-into-mutliple-tables-with-a-composite-primary-key-tp64874p65549.html
Sent from the SQLite mailing list archive at Nabble.com.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to