On Sunday, 2 December, 2018 12:57, Simon Slavin <slav...@bigfraud.org> wrote:

>On 2 Dec 2018, at 7:29pm, E.Pasma <pasm...@concepts.nl> wrote:

>> drop table x;
>> create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID;
>>
>> insert into x select random() from generate_series where start=1
>and stop=10000000;
>> Run Time: real 88.759 user 36.276227 sys 44.190566
>
>Realtime is 88.759
>
>> create table x(value INTEGER NOT NULL);
>>
>> insert into x select random() from generate_series where start=1
>and stop=10000000;
>> Run Time: real 3.980 user 2.266869 sys 0.124012
>>
>> create unique index ix on x(value);
>> Run Time: real 10.131 user 7.623369 sys 0.797015
>
>Total realtime 14.111
>
>I had not expected that much of a difference.
>
>The long time for the WITHOUT ROWID sample is caused by the index
>being built as the rows are inserted.  Since the values are added in
>a random order, the tree is being expanded at random points each
>time.  In contrast, CREATE UNIQUE INDEX is able to form the index
>tree in a better-optimised way, since it has all the values it needs
>available when it starts.

Well, yes and no.  CREATE INDEX as a separate step is extracting the keys, 
sorting them, then building the tree with in-order inserts.  When the key is 
part of the table (whether with or without rowid) and the inserts are in random 
order, the tree must be re-balanced much more often (in the worst case on each 
insert) rather than the case of only per level for in-order inserts.

Most of this can be overcome by simply having a larger cache size to permit 
(more of) those operations to be performed in RAM rather than putting a 
continuous I/O load on the system.  

Cache size and the number of threads used for sorting has a huge effect on 
performance (this time, to an on-disk database on an SSD) with various cache 
sizes and threads, for 100,000,000 records (a smaller number of records does 
not produce meaningful differences for me when using bigger configurations):

>sqlite test.db < test.sql
pragma cache_size=1048576;
pragma threads=8;
pragma temp_store=1;
.timer on
---
drop table if exists x;
Run Time: real 0.000 user 0.000000 sys 0.000000
create table x(value INTEGER PRIMARY KEY);
Run Time: real 0.018 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 190.586 user 185.531250 sys 3.359375
---
drop table x;
Run Time: real 1.468 user 1.437500 sys 0.000000
create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID;
Run Time: real 0.013 user 0.015625 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 180.595 user 176.437500 sys 1.734375
---
drop table x;
Run Time: real 0.946 user 0.906250 sys 0.015625
create table x(value INTEGER NOT NULL);
Run Time: real 0.014 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 22.569 user 19.703125 sys 2.078125
create unique index ix on x(value);
Run Time: real 49.409 user 136.390625 sys 12.484375
---
drop table x;
Run Time: real 1.940 user 1.875000 sys 0.031250
create table x(value INTEGER NOT NULL);
Run Time: real 0.019 user 0.000000 sys 0.000000
create unique index ix on x(value);
Run Time: real 0.014 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 207.955 user 202.125000 sys 4.000000
---
.exit

>sqlite test.db < test.sql
pragma cache_size=1048576;
pragma threads=0;
0
pragma temp_store=1;
.timer on
---
drop table if exists x;
Run Time: real 8.282 user 2.515625 sys 5.734375
create table x(value INTEGER PRIMARY KEY);
Run Time: real 0.014 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 189.177 user 182.656250 sys 3.187500
---
drop table x;
Run Time: real 1.465 user 1.437500 sys 0.000000
create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID;
Run Time: real 0.014 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 178.972 user 172.265625 sys 2.937500
---
drop table x;
Run Time: real 0.938 user 0.890625 sys 0.000000
create table x(value INTEGER NOT NULL);
Run Time: real 0.014 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 25.313 user 19.703125 sys 2.812500
create unique index ix on x(value);
Run Time: real 117.165 user 106.921875 sys 6.421875
---
drop table x;
Run Time: real 2.037 user 1.953125 sys 0.046875
create table x(value INTEGER NOT NULL);
Run Time: real 0.027 user 0.000000 sys 0.015625
create unique index ix on x(value);
Run Time: real 0.018 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 203.974 user 198.265625 sys 3.875000
---
.exit

All else being equal, allowing SQLite to create multiple threads for sorting 
when creating the index separately has an enormous impact.  So now lets try the 
same thing with a smaller cache_size (above is 4 GB, below is 1 GB, still 
absolutely huge in comparison to the default):

>sqlite test.db < test.sql
pragma cache_size=65536;
pragma threads=8;
8
pragma temp_store=1;
.timer on
---
drop table if exists x;
Run Time: real 7.527 user 2.062500 sys 5.437500
create table x(value INTEGER PRIMARY KEY);
Run Time: real 0.014 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 1044.661 user 277.500000 sys 754.500000
---
drop table x;
Run Time: real 3.567 user 1.656250 sys 1.906250
create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID;
Run Time: real 0.013 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 995.853 user 259.609375 sys 722.453125
---
drop table x;
Run Time: real 2.967 user 0.828125 sys 2.093750
create table x(value INTEGER NOT NULL);
Run Time: real 0.014 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 26.038 user 19.046875 sys 4.375000
create unique index ix on x(value);
Run Time: real 46.088 user 132.109375 sys 15.718750
---
drop table x;
Run Time: real 7.166 user 1.765625 sys 5.375000
create table x(value INTEGER NOT NULL);
Run Time: real 0.018 user 0.000000 sys 0.000000
create unique index ix on x(value);
Run Time: real 0.013 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 1204.187 user 307.687500 sys 880.281250
---
.exit

>sqlite test.db < test.sql
pragma cache_size=65536;
pragma threads=0;
0
pragma temp_store=1;
.timer on
---
drop table if exists x;
Run Time: real 7.482 user 2.640625 sys 4.812500
create table x(value INTEGER PRIMARY KEY);
Run Time: real 0.013 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 1047.682 user 279.687500 sys 757.875000
---
drop table x;
Run Time: real 3.567 user 1.390625 sys 2.140625
create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID;
Run Time: real 0.014 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 995.550 user 263.703125 sys 722.578125
---
drop table x;
Run Time: real 2.942 user 0.781250 sys 2.125000
create table x(value INTEGER NOT NULL);
Run Time: real 0.013 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 25.967 user 19.500000 sys 4.000000
create unique index ix on x(value);
Run Time: real 112.580 user 100.156250 sys 9.718750
---
drop table x;
Run Time: real 7.060 user 2.031250 sys 4.953125
create table x(value INTEGER NOT NULL);
Run Time: real 0.016 user 0.000000 sys 0.000000
create unique index ix on x(value);
Run Time: real 0.017 user 0.000000 sys 0.000000
insert into x select random() from generate_series where start=1 and 
stop=100000000;
Run Time: real 1202.757 user 305.546875 sys 885.656250
---
.exit




_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to