RE: [sqlite] How to handle large amount of data?

André Goliath Tue, 11 Apr 2006 03:37:48 -0700

I think that´s exactly what I´m going to do.
I´ll have to get the ok from my customer regarding the longer time of an
INSERT/UPDTAE,
but I think he will be ok with that.
Unless you have ebtter suggestions my DB will look like this.
('Takes' are the objects)

CREATE TABLE [takefields] 
(
        [take] integer NOT NULL , 
        [field] integer NOT NULL , 
        [content] memo,
        PRIMARY KEY ([field])
);
--
CREATE TABLE [takes] 
(
        [ID] integer NOT NULL , 
        [inpoint] integer, 
        [outpoint] integer, 
        [date] integer, 
        [screen] varchar (255), 
        [comment] varchar (255),
        PRIMARY KEY ([ID])
);
--
CREATE TABLE [wordlist] 
(
        [field] integer, 
        [take] integer, 
        [word] varchar (255),
        PRIMARY KEY ([word])
);

IIRC INDEXes are only a kind of copy of the Table they where crated on with
lesser columns, right?
So I don´t think an INDEX would improve the performance here, am I right?

The typical User´s request would be:
"Get me all Takes in which the words "Foo" and "Bar" appear, but only in
fields 1,3 or 5 

Thanks for your help!

-----Original Message-----
From: Dennis Cote [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 11, 2006 1:00 AM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] How to handle large amount of data?

André Goliath wrote:

>Do you (or anyone else) have any experience with those large DBs in terms
of
>SELECT execution times?
>ms, seconds or minutes?
>
>Since I would mainly use field numbers in the WHERE clause I´d create  an
>index on them I guess?
>
>  
>
With an index on the field number and the object id, you should be able 
to retrieve a particular field for a particular object in a few ms. To 
search through the text of one particular field for all one million 
objects will still take a while, even with this index. You will be 
reading in and scanning through about 1 GB of data (1E6 objects x 1 
field per object x 1E3 bytes per field).

This is where the full text search mechanisms come in handy. Your 1 GB 
of data may only contain 10K unique words, and each 1 KB field probably 
has only about 200 words. If you create a table of unique words, and 
then another table that relates each field to each of its unique words, 
you can find all the fields that contain a word very quickly. These 
searches can be combined to find all the fields that contain a 
combination of words, or any one of a set of words etc.

In your case you might want to keep separate tables to relate the words 
of each field_number to the field_data, or add the field number to the 
relation table. Either of these will make searching for a combination of 
a word in a particular field much faster.

This way you trade off some extra work when you insert the data for more 
speed when you read the data.

Once you have a field id, you can get full text of the field and the 
object id (so that you can get whatever other info you have for the 
object) very quickly.

This all assumes you are trying to match various words in the field 
data, and not the entire text of the field data. The later is much simpler.

>The client first wanted to store that data in an XML file,
>but I was able to convince him to use a DB.
>
>  
>
Good, you probably saved yourself a lot of work, and them a lot of 
grief. :-)

HTH
Dennis Cote

RE: [sqlite] How to handle large amount of data?

Reply via email to