RE: [sqlite] How to handle large amount of data?

André Goliath Tue, 11 Apr 2006 10:26:26 -0700

wow, I really have to learn much more about the SQL, thanks for your great
help!


-----Original Message-----
From: Dennis Cote [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 11, 2006 7:13 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] How to handle large amount of data?

André Goliath wrote:

>Unless you have ebtter suggestions my DB will look like this.
>('Takes' are the objects)
>
>
>CREATE TABLE [takefields] 
>(
>       [take] integer NOT NULL , 
>       [field] integer NOT NULL , 
>       [content] memo,
>       PRIMARY KEY ([field])
>);
>
Primary keys must be unique. You will have many rows with the same field 
number, but you will have only one field for each take, so your primary 
key should be the combination of take and field. The datatype 'memo' 
will give the content column numeric affinity in SQLite (see 
http://www.sqlite.org/datatype3.html). I would suggest this instead;

    CREATE TABLE [takefields]
    (
        [take] integer NOT NULL references[takes]([ID]) ,
        [field] integer NOT NULL ,
        [content] text,
        PRIMARY KEY ([take], [field])
    );

This will let you quickly locate all the fields of a given take, but 
will still be slow finding all the fields with a given number. To speed 
this up you need to add an index that groups the records by field number 
first.

    CREATE INDEX [fieldsindex] on [takefields]([field], [take]);

>--
>CREATE TABLE [takes] 
>(
>       [ID] integer NOT NULL , 
>       [inpoint] integer, 
>       [outpoint] integer, 
>       [date] integer, 
>       [screen] varchar (255), 
>       [comment] varchar (255),
>       PRIMARY KEY ([ID])
>);
>
Primary keys can't be null so the not null constraint on ID is 
redundant. I also find it is clearer to keep the primary key constraint 
with the column declaration  for a single column primary key.

    CREATE TABLE [takes]
    (
        [ID] integer PRIMARY KEY ,
        [inpoint] integer,
        [outpoint] integer,
        [date] integer,
        [screen] varchar (255),
        [comment] varchar (255)
    );

>--
>CREATE TABLE [wordlist] 
>(
>       [field] integer, 
>       [take] integer, 
>       [word] varchar (255),
>       PRIMARY KEY ([word])
>);
>
>
This table should be split into two related tables as well. One that 
holds the unique words, and one that relates the words to the fields.

    CREATE TABLE [wordlist]
    (
        [ID] integer PRIMARY KEY,
        [word] varchar (255) UNIQUE
    );

    CREATE TABLE [wordfields]
    (
        [wordid] integer references [wordlist]([ID]),
        [field] integer,
        [take] integer,
        PRIMARY KEY ([wordid], [field], [take])
    );

>IIRC INDEXes are only a kind of copy of the Table they where crated on with
>lesser columns, right?
>So I don´t think an INDEX would improve the performance here, am I right?
>
Well sort of, indexes are copies of parts of the table. The key is that 
indexes are ordered in a way that is useful to speed up queries. An 
index allows the database engine to go directly to a record, or to the 
start of a group of records, with the required value.

With the table definitions above you have indexes on takes(id), 
takefields(take, field), takefields(field, take), wordlist(id), 
wordlist(word), wordfields(wordid, field, take). Note the unique 
constraint on the word column in wordlist creates an index.

>
>The typical User´s request would be:
>"Get me all Takes in which the words "Foo" and "Bar" appear, but only in
>fields 1,3 or 5 
>
>
Given the tables above, this query should do what you want.  

    -- get info from all takes with Foo and Bar in fields 1, 3, or 5
    select takes.* from takes where takes.id in
    (
        --get id of all takes with Foo in field 1, 3, or 5
        select fields1.take
        from wordlist as word1
        join wordfields as fields1 on word1.id = fields1.wordid
        where word1.word = 'Foo'
        and fields1.field in (1, 3, 5)
    intersect
        --get id of all takes with Bar in field 1, 3, or 5
        select fields2.take
        from wordlist as word2
        join wordfields as fields2 on word2.id = fields2.wordid
        where word2.word = 'Bar'
        and fields2.field in (1, 3, 5)
    );

If you want the actual content of the fields for those takes you would 
join this result with the takefields table something like this.

    -- ordered list of all take fields with Foo and Bar in fields 1, 3, or 5
    select takes.*, takefields.field, takefields.content
    from takes
    left join takefields on takes.id = takefields.take
    where takes.id in
    (
        --get id of all takes with Foo in field 1, 3, or 5
        select fields1.take
        from wordlist as word1
        join wordfields as fields1 on word1.id = fields1.wordid
        where word1.word = 'Foo'
        and fields1.field in (1, 3, 5)
    intersect
        --get id of all takes with Bar in field 1, 3, or 5
        select fields2.take
        from wordlist as word2
        join wordfields as fields2 on word2.id = fields2.wordid
        where word2.word = 'Bar'
        and fields2.field in (1, 3, 5)
    )
    and takefields.field in (1, 3, 5)
    order by takes.id, takefields.field;

I'm sure there are other ways to join the tables to get the same result, 
and some might execute faster than this. You will have to try them out 
with some real data.

HTH
Dennis Cote

RE: [sqlite] How to handle large amount of data?

Reply via email to