On Sunday, 2 December, 2018 12:57, Simon Slavin <slav...@bigfraud.org> wrote:
>On 2 Dec 2018, at 7:29pm, E.Pasma <pasm...@concepts.nl> wrote: >> drop table x; >> create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID; >> >> insert into x select random() from generate_series where start=1 >and stop=10000000; >> Run Time: real 88.759 user 36.276227 sys 44.190566 > >Realtime is 88.759 > >> create table x(value INTEGER NOT NULL); >> >> insert into x select random() from generate_series where start=1 >and stop=10000000; >> Run Time: real 3.980 user 2.266869 sys 0.124012 >> >> create unique index ix on x(value); >> Run Time: real 10.131 user 7.623369 sys 0.797015 > >Total realtime 14.111 > >I had not expected that much of a difference. > >The long time for the WITHOUT ROWID sample is caused by the index >being built as the rows are inserted. Since the values are added in >a random order, the tree is being expanded at random points each >time. In contrast, CREATE UNIQUE INDEX is able to form the index >tree in a better-optimised way, since it has all the values it needs >available when it starts. Well, yes and no. CREATE INDEX as a separate step is extracting the keys, sorting them, then building the tree with in-order inserts. When the key is part of the table (whether with or without rowid) and the inserts are in random order, the tree must be re-balanced much more often (in the worst case on each insert) rather than the case of only per level for in-order inserts. Most of this can be overcome by simply having a larger cache size to permit (more of) those operations to be performed in RAM rather than putting a continuous I/O load on the system. Cache size and the number of threads used for sorting has a huge effect on performance (this time, to an on-disk database on an SSD) with various cache sizes and threads, for 100,000,000 records (a smaller number of records does not produce meaningful differences for me when using bigger configurations): >sqlite test.db < test.sql pragma cache_size=1048576; pragma threads=8; pragma temp_store=1; .timer on --- drop table if exists x; Run Time: real 0.000 user 0.000000 sys 0.000000 create table x(value INTEGER PRIMARY KEY); Run Time: real 0.018 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 190.586 user 185.531250 sys 3.359375 --- drop table x; Run Time: real 1.468 user 1.437500 sys 0.000000 create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID; Run Time: real 0.013 user 0.015625 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 180.595 user 176.437500 sys 1.734375 --- drop table x; Run Time: real 0.946 user 0.906250 sys 0.015625 create table x(value INTEGER NOT NULL); Run Time: real 0.014 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 22.569 user 19.703125 sys 2.078125 create unique index ix on x(value); Run Time: real 49.409 user 136.390625 sys 12.484375 --- drop table x; Run Time: real 1.940 user 1.875000 sys 0.031250 create table x(value INTEGER NOT NULL); Run Time: real 0.019 user 0.000000 sys 0.000000 create unique index ix on x(value); Run Time: real 0.014 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 207.955 user 202.125000 sys 4.000000 --- .exit >sqlite test.db < test.sql pragma cache_size=1048576; pragma threads=0; 0 pragma temp_store=1; .timer on --- drop table if exists x; Run Time: real 8.282 user 2.515625 sys 5.734375 create table x(value INTEGER PRIMARY KEY); Run Time: real 0.014 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 189.177 user 182.656250 sys 3.187500 --- drop table x; Run Time: real 1.465 user 1.437500 sys 0.000000 create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID; Run Time: real 0.014 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 178.972 user 172.265625 sys 2.937500 --- drop table x; Run Time: real 0.938 user 0.890625 sys 0.000000 create table x(value INTEGER NOT NULL); Run Time: real 0.014 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 25.313 user 19.703125 sys 2.812500 create unique index ix on x(value); Run Time: real 117.165 user 106.921875 sys 6.421875 --- drop table x; Run Time: real 2.037 user 1.953125 sys 0.046875 create table x(value INTEGER NOT NULL); Run Time: real 0.027 user 0.000000 sys 0.015625 create unique index ix on x(value); Run Time: real 0.018 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 203.974 user 198.265625 sys 3.875000 --- .exit All else being equal, allowing SQLite to create multiple threads for sorting when creating the index separately has an enormous impact. So now lets try the same thing with a smaller cache_size (above is 4 GB, below is 1 GB, still absolutely huge in comparison to the default): >sqlite test.db < test.sql pragma cache_size=65536; pragma threads=8; 8 pragma temp_store=1; .timer on --- drop table if exists x; Run Time: real 7.527 user 2.062500 sys 5.437500 create table x(value INTEGER PRIMARY KEY); Run Time: real 0.014 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 1044.661 user 277.500000 sys 754.500000 --- drop table x; Run Time: real 3.567 user 1.656250 sys 1.906250 create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID; Run Time: real 0.013 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 995.853 user 259.609375 sys 722.453125 --- drop table x; Run Time: real 2.967 user 0.828125 sys 2.093750 create table x(value INTEGER NOT NULL); Run Time: real 0.014 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 26.038 user 19.046875 sys 4.375000 create unique index ix on x(value); Run Time: real 46.088 user 132.109375 sys 15.718750 --- drop table x; Run Time: real 7.166 user 1.765625 sys 5.375000 create table x(value INTEGER NOT NULL); Run Time: real 0.018 user 0.000000 sys 0.000000 create unique index ix on x(value); Run Time: real 0.013 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 1204.187 user 307.687500 sys 880.281250 --- .exit >sqlite test.db < test.sql pragma cache_size=65536; pragma threads=0; 0 pragma temp_store=1; .timer on --- drop table if exists x; Run Time: real 7.482 user 2.640625 sys 4.812500 create table x(value INTEGER PRIMARY KEY); Run Time: real 0.013 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 1047.682 user 279.687500 sys 757.875000 --- drop table x; Run Time: real 3.567 user 1.390625 sys 2.140625 create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID; Run Time: real 0.014 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 995.550 user 263.703125 sys 722.578125 --- drop table x; Run Time: real 2.942 user 0.781250 sys 2.125000 create table x(value INTEGER NOT NULL); Run Time: real 0.013 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 25.967 user 19.500000 sys 4.000000 create unique index ix on x(value); Run Time: real 112.580 user 100.156250 sys 9.718750 --- drop table x; Run Time: real 7.060 user 2.031250 sys 4.953125 create table x(value INTEGER NOT NULL); Run Time: real 0.016 user 0.000000 sys 0.000000 create unique index ix on x(value); Run Time: real 0.017 user 0.000000 sys 0.000000 insert into x select random() from generate_series where start=1 and stop=100000000; Run Time: real 1202.757 user 305.546875 sys 885.656250 --- .exit _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users