Dennis and Jay,

thanks to both of you.

I like the idea of indexing the words as FT3 does,
I´ll look into that.

Denis,
I didn´t knew references where possible in Sqlite 3.2,
thanks for pointing that out!
Do you have more specific information on how that is implemented in SQLite?
Would you think the amount of memory needed would be acceptable for an
field_data with
1.000.000 objects * 21 fields * 1.000 chars ?
It is quite unlikely to reach that DB size,
but I have to expect the unexpected ;)

Thanks again,

André 

-----Original Message-----
From: Dennis Cote [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 10, 2006 8:54 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] How to handle large amount of data?

André Goliath wrote:

>HI List,
> 
>I´m curently in the design phase of a rather large DB project
>and would love to hear your qualified thoughts on this.
> 
>You can think of the data to be stored as objects with some properties, an
>unique ID and then 21 fields which will contain TEXT data,
> 
>The DB acts as kind of archieve that allows the user to search in the
>object´s text fields.
>Each field will contain around 100 to 1.000 chars, and per object only 7-12
>fields will be used.
>I´m expecting an amount of about 1.000.000 objects to be stored.
> 
>I thought of organizing the whole archive in three tables:
>one for the general properties of the object,
>one that contains the object´s ID and the text columns 1 to 9
>and one table that will contain the ID and the other columns
> 
>I splitted the data up like this because in most cases the user will only
>search for data in the first 9 columns,
>so in that case the DB would not need to scan the whole data.
> 
>Does this layout makes sense?
>The queries run on that table will be simple lookups for words in the text
>with options simliar to Google (exact phrase, single word, all words,..)
>He can also select which colums to search in.
> 
>If you have any thoughts about this design or ideas on creating indexes or
>something else what might improve performance I would
>love to hear your thoughts!
> 
>TIA
> 
>André
> 
>
>  
>
Andre,

Generally, when you have a variable number of fields in a relational 
database, you should normalize it. In this case you have a variable 
number of text fields associated with each object. So you should store 
the objects in one table, which has a unique id for each object. You 
should store all the text data as rows in a separate table. Each row in 
the second table will contain the id of the object it is associated 
with, the field number that this row corresponds to for that object, and 
the field's text. You will have one table with one row per object and a 
second with 7-12 rows per object, thought there could be anywhere from 0 
to 21 rows for any particular object.

  create table object (
    object_id integer primary_key,
    object_name text
  );

  create table field_data (
    object_id integer references object;
    field_num integer;
    field_text text;
  );

You can combine the two table using SQL joins to get all the fields 
associated with an object.

  select * from field_data where object_id = ?;

You can search a particular field

  select object_id from field_data
  where field_num = 10 and field_text like (select '%' || ? || '%');

You can also search through a set or range of fields

  select object_id from field_data
  where field_num between 1 and 9
  and field_text like (select '%' || ? || '%');

  select object_id from field_data
  where field_num in (3, 5, 6, 8, 9, 10, 15)
  and field_text like (select '%' || ? || '%');


HTH
Dennis Cote

Reply via email to