André Goliath wrote:

Do you (or anyone else) have any experience with those large DBs in terms of
SELECT execution times?
ms, seconds or minutes?

Since I would mainly use field numbers in the WHERE clause I´d create  an
index on them I guess?

With an index on the field number and the object id, you should be able to retrieve a particular field for a particular object in a few ms. To search through the text of one particular field for all one million objects will still take a while, even with this index. You will be reading in and scanning through about 1 GB of data (1E6 objects x 1 field per object x 1E3 bytes per field).

This is where the full text search mechanisms come in handy. Your 1 GB of data may only contain 10K unique words, and each 1 KB field probably has only about 200 words. If you create a table of unique words, and then another table that relates each field to each of its unique words, you can find all the fields that contain a word very quickly. These searches can be combined to find all the fields that contain a combination of words, or any one of a set of words etc.

In your case you might want to keep separate tables to relate the words of each field_number to the field_data, or add the field number to the relation table. Either of these will make searching for a combination of a word in a particular field much faster.

This way you trade off some extra work when you insert the data for more speed when you read the data.

Once you have a field id, you can get full text of the field and the object id (so that you can get whatever other info you have for the object) very quickly.

This all assumes you are trying to match various words in the field data, and not the entire text of the field data. The later is much simpler.

The client first wanted to store that data in an XML file,
but I was able to convince him to use a DB.

Good, you probably saved yourself a lot of work, and them a lot of grief. :-)

HTH
Dennis Cote

Reply via email to