André Goliath wrote:
Do you (or anyone else) have any experience with those large DBs in terms of
SELECT execution times?
ms, seconds or minutes?
Since I would mainly use field numbers in the WHERE clause I´d create an
index on them I guess?
With an index on the field number and the object id, you should be able
to retrieve a particular field for a particular object in a few ms. To
search through the text of one particular field for all one million
objects will still take a while, even with this index. You will be
reading in and scanning through about 1 GB of data (1E6 objects x 1
field per object x 1E3 bytes per field).
This is where the full text search mechanisms come in handy. Your 1 GB
of data may only contain 10K unique words, and each 1 KB field probably
has only about 200 words. If you create a table of unique words, and
then another table that relates each field to each of its unique words,
you can find all the fields that contain a word very quickly. These
searches can be combined to find all the fields that contain a
combination of words, or any one of a set of words etc.
In your case you might want to keep separate tables to relate the words
of each field_number to the field_data, or add the field number to the
relation table. Either of these will make searching for a combination of
a word in a particular field much faster.
This way you trade off some extra work when you insert the data for more
speed when you read the data.
Once you have a field id, you can get full text of the field and the
object id (so that you can get whatever other info you have for the
object) very quickly.
This all assumes you are trying to match various words in the field
data, and not the entire text of the field data. The later is much simpler.
The client first wanted to store that data in an XML file,
but I was able to convince him to use a DB.
Good, you probably saved yourself a lot of work, and them a lot of
grief. :-)
HTH
Dennis Cote