Re: [sqlite] How to handle large amount of data?

Dennis Cote Tue, 11 Apr 2006 10:14:26 -0700

André Goliath wrote:

Unless you have ebtter suggestions my DB will look like this.
('Takes' are the objects)
CREATE TABLE [takefields]([take] integer NOT NULL ,[field] integer NOT NULL ,[content] memo,
        PRIMARY KEY ([field])
);

Primary keys must be unique. You will have many rows with the same fieldnumber, but you will have only one field for each take, so your primarykey should be the combination of take and field. The datatype 'memo'will give the content column numeric affinity in SQLite (seehttp://www.sqlite.org/datatype3.html). I would suggest this instead;


   CREATE TABLE [takefields]
   (
       [take] integer NOT NULL references[takes]([ID]) ,
       [field] integer NOT NULL ,
       [content] text,
       PRIMARY KEY ([take], [field])
   );

This will let you quickly locate all the fields of a given take, butwill still be slow finding all the fields with a given number. To speedthis up you need to add an index that groups the records by field numberfirst.


   CREATE INDEX [fieldsindex] on [takefields]([field], [take]);

--
CREATE TABLE [takes]([ID] integer NOT NULL ,[inpoint] integer,[outpoint] integer,[date] integer,[screen] varchar (255),[comment] varchar (255),
        PRIMARY KEY ([ID])
);

Primary keys can't be null so the not null constraint on ID isredundant. I also find it is clearer to keep the primary key constraintwith the column declaration for a single column primary key.


   CREATE TABLE [takes]
   (
       [ID] integer PRIMARY KEY ,
       [inpoint] integer,
       [outpoint] integer,
       [date] integer,
       [screen] varchar (255),
       [comment] varchar (255)
   );

--
CREATE TABLE [wordlist]([field] integer,[take] integer,[word] varchar (255),
        PRIMARY KEY ([word])
);

This table should be split into two related tables as well. One thatholds the unique words, and one that relates the words to the fields.


   CREATE TABLE [wordlist]
   (
       [ID] integer PRIMARY KEY,
       [word] varchar (255) UNIQUE
   );

   CREATE TABLE [wordfields]
   (
       [wordid] integer references [wordlist]([ID]),
       [field] integer,
       [take] integer,
       PRIMARY KEY ([wordid], [field], [take])
   );

IIRC INDEXes are only a kind of copy of the Table they where crated on with
lesser columns, right?
So I don´t think an INDEX would improve the performance here, am I right?

Well sort of, indexes are copies of parts of the table. The key is thatindexes are ordered in a way that is useful to speed up queries. Anindex allows the database engine to go directly to a record, or to thestart of a group of records, with the required value.

With the table definitions above you have indexes on takes(id),takefields(take, field), takefields(field, take), wordlist(id),wordlist(word), wordfields(wordid, field, take). Note the uniqueconstraint on the word column in wordlist creates an index.


The typical User´s request would be:
"Get me all Takes in which the words "Foo" and "Bar" appear, but only in

fields 1,3 or 5

Given the tables above, this query should do what you want.

   -- get info from all takes with Foo and Bar in fields 1, 3, or 5
   select takes.* from takes where takes.id in
   (
       --get id of all takes with Foo in field 1, 3, or 5
       select fields1.take
       from wordlist as word1
       join wordfields as fields1 on word1.id = fields1.wordid
       where word1.word = 'Foo'
       and fields1.field in (1, 3, 5)
   intersect
       --get id of all takes with Bar in field 1, 3, or 5
       select fields2.take
       from wordlist as word2
       join wordfields as fields2 on word2.id = fields2.wordid
       where word2.word = 'Bar'
       and fields2.field in (1, 3, 5)
   );

If you want the actual content of the fields for those takes you wouldjoin this result with the takefields table something like this.


   -- ordered list of all take fields with Foo and Bar in fields 1, 3, or 5
   select takes.*, takefields.field, takefields.content
   from takes
   left join takefields on takes.id = takefields.take
   where takes.id in
   (
       --get id of all takes with Foo in field 1, 3, or 5
       select fields1.take
       from wordlist as word1
       join wordfields as fields1 on word1.id = fields1.wordid
       where word1.word = 'Foo'
       and fields1.field in (1, 3, 5)
   intersect
       --get id of all takes with Bar in field 1, 3, or 5
       select fields2.take
       from wordlist as word2
       join wordfields as fields2 on word2.id = fields2.wordid
       where word2.word = 'Bar'
       and fields2.field in (1, 3, 5)
   )
   and takefields.field in (1, 3, 5)
   order by takes.id, takefields.field;

I'm sure there are other ways to join the tables to get the same result,and some might execute faster than this. You will have to try them outwith some real data.


HTH
Dennis Cote

Re: [sqlite] How to handle large amount of data?

Reply via email to