On 1/29/07, David Champagne <[EMAIL PROTECTED]> wrote:
Hi,

I've got an application that creates a database with a large number of records 
(millions),
and the indexation of the tables is taking a long time.  Once the database is 
initially
created, it is never modified.  No records are added or deleted.  So, in the 
case where a
user has sufficient memory, I want to offer the option to create the entire 
database in
memory, including the indexes, and then serialize that to disk.  I am not sure 
if this is
really going to work or if it will be more effcient.

I had a similar problem, and going this route was very much worth it
for me. Though you might find that its not worth it for you. Only way
to know for sure is to tray both methods and compare the results.


I am using the "ATTACH DATABASE"
command to create a copy of my ":memory:" database.  I can copy copy the tables 
(see
below), and I can get a list of the indxes (see further below), but I don't 
know how to copy
the indexes.

This will return a list of sql statements used to create all indexes
in your db. You just need to execute each row returned.

SELECT sql FROM sqlite_master WHERE type='index' AND sql IS NOT NULL

IS NOT NULL check at the end is used to filter out autoindex-es
created implicitly by sqlite in case of a primary key and UNIQUE
constraint.

Also, does anyone know if this method is really going to be faster (memory
db -> disk db) than doing everything with a disk db?  Would the indexes really 
be copied
or just re-created?

They'll be recreated. But in order to achieve faster index creation
times for disk db, you need to populate your disk db with presorted
data. Something like this:

CREATE TABLE dbdisk.table1 AS SELECT * FROM table1 ORDER BY index_col

index_col being the column on which you want to create an index in
disk db. You might find  that its not even worth it to create an index
in memory db, just execute the above statement. In my limited
experience regarding this, it takes about the same amount of time, but
again your mileage will vary.
In my case this was all easy to do since I control the schema and I
only have one index in my table. If you don't control the schema, I
don't think this is worth the hassle, if even possible.

Trick, if I can call it that, here is to create index in disk db when
data is already sorted. That way you improve index creation speed and
locality of reference which should translate into somewhat better
query speed if your db is big. In order to get most of that, you'd
want to avoid any implicit index-es except for INTEGER PRIMARY KEY.
For example, if you have a schema like this:

CREATE TABLE t1(name TEXT PRIMARY KEY, age INT);

You'd want to transform that into something more like this:

CREATE TABLE t1(name TEXT, age INT);
CREATE INDEX t1_idx1 ON t1(name);

This whole thing will probably provide no significant gain if your
resulting disk db is less then 50-100 MB.

--
Nemanja Corlija <[EMAIL PROTECTED]>

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to