Hi all

I am trying to put in place a simple replication process to copy a database 
from one machine to an other.
The table I have is something like
CREATE TABLE sn2uid(sn VARCHAR(100) NOT NULL, uid INTEGER NOT NULL, PRIMARY KEY 
(sn));
CREATE INDEX uidindex on sn2uid ( uid )

Where the (sn,uid) pairs are pretty much random.

On my test data, I have around 3 million entries, the size on disk being about 
280 Mb
If I do a 'select * from sn2uid' > db, I get around 100Mb worth of data. I was 
thinking to simply stream the result from that query over tcp (http really), 
and do the inserts on the other side...

The problem I have is that, while doing this select takes about 10 seconds on 
my machine, I didn't find any quick way to insert quickly onto the other 
machine.

After a while, the db file size grows very very slowly, even when using 
transactions

My question is:

is there a way to do a select or a .dump so that when inserting the data on the 
other end, things will be faster? Or maybe there are some pragmas I can use 
that would improve performance?

To me, it seems that the reason things are slow is that even though I do the 
inserts in a transaction, the btrees are modified independently, and in that 
case randomly. If I was getting the data in the right order in terms of the 
btree, I think things could be significantly faster...

What I tried was to simply something like:
sqlite3 myorg.db '.dump sn2uid' > ~/sn2uid.txt
sqlite3 mynew.db < ~/sn2uid.txt

Would grouping inserts together by groups of 10000 or so make things faster 
instead of one gigantic transaction? I am wondering in particular if the btree 
insert code is smart enough to build the tree and merge it into the main db 
file faster in that case?

Thanks!

Nicolas

Reply via email to