If you pre-sort the data prior to inserting into sqlite you will
see much better timings for both the default cache size and
the larger cache size.
When I run your script (unmodified, except for the removal of
MySQL) up to 400000:
tablesize sqlite sqlite-largecache
0 1.956 1.956
20000 2.064 2.079
40000 2.095 2.111
60000 2.173 2.173
80000 2.189 2.283
100000 2.345 2.298
120000 2.345 2.360
140000 2.407 2.361
160000 2.470 2.423
180000 2.548 2.501
200000 2.564 2.439
220000 2.830 2.439
240000 2.876 2.907
260000 2.829 2.564
280000 3.423 3.533
300000 4.939 3.564
320000 7.236 3.736
340000 7.283 3.751
360000 10.611 3.767
380000 11.142 3.845
400000 13.736 3.798
When I make the following change to your script to simulate an ordered data set
(okay, it's cheating slightly - no pre-sort need be performed here):
# set r [expr {int(rand()*500000)}]
set r [expr {($num+$i)*5}]
tablesize sqlite sqlite-largecache
0 1.878 1.894
20000 1.925 1.923
40000 1.923 1.923
60000 1.923 1.954
80000 1.970 1.923
100000 1.923 1.908
120000 1.923 1.970
140000 1.940 2.383
160000 2.048 1.908
180000 1.923 2.002
200000 1.923 1.923
220000 1.939 1.923
240000 1.923 1.923
260000 1.939 1.923
280000 1.939 1.939
300000 1.954 1.939
320000 1.939 1.923
340000 1.970 1.954
360000 1.939 1.954
380000 1.923 1.939
400000 1.970 1.939
I wonder if other databases pre-sort their batch inserts by
index order to achieve more consistant insert speed with
datasets that exceed cache size.
----- Original Message ----
From: Peter De Rijk <[EMAIL PROTECTED]>
To: [email protected]
Sent: Saturday, October 28, 2006 10:40:03 AM
Subject: Re: [sqlite] serious performance problems with indexes
I have checked more fine grained (results and code at the end of the
mail), and it seems indeed that the extreme growth of time needed
for inserts only starts at around 160000 rows (on my system). Before
that, it also rises, but much much more slowly. As this indeed
looked like a cache effect, I tried setting the default_cache_size
to something big (20000). This was enough to remove the bend (at
least till 2000000 rows), so that the increase in time kept going
at the same slower pace as for the smaller data sizes.
Curiously, mysql which I included as a reference, manages to
keep nearly the same insert time even for the largest data sets.
Using the large cache size, sqlite is faster than mysql until about
700000 rows. At 2000000 rows mysql is about 3 times faster than sqlite
with large cache (vs more than 20 times faster than sqlite with
default cache size).
-------
tablesize sqlite sqlite-largecache mysql
0 1.12 1.04 2.86
20000 1.09 1.17 2.89
40000 1.14 1.15 3.27
60000 1.23 1.2 3.18
80000 1.47 1.25 3.01
100000 1.3 1.32 2.94
120000 1.4 1.34 2.94
140000 2.06 1.38 3
160000 2.41 1.46 2.99
180000 3.09 1.54 3.03
200000 3.31 1.62 3.04
220000 3.84 1.72 3.06
240000 4.22 1.7 3.03
260000 4.23 1.76 3.06
280000 4.83 1.94 2.98
300000 6.27 2.08 3.07
320000 9.02 2.06 3.11
340000 9.48 1.94 2.88
360000 10 2.13 3.04
380000 10.6 2.21 3.13
400000 10.72 2.35 3.05
420000 11.35 2.24 3.11
440000 11.55 2.49 3.05
460000 11.75 2.5 3.07
480000 16.89 2.39 3.1
500000 17.81 2.59 3.09
520000 19.22 2.4 3.03
540000 19.35 2.77 3.05
560000 19.44 2.59 3.1
580000 19.87 2.67 3.08
600000 21.47 2.88 3.09
620000 20.82 2.74 3.08
640000 21.55 3 3.11
660000 25.17 3.09 3.12
680000 29.89 2.84 3.16
700000 28.48 3.25 3.09
720000 28.62 3.31 3.2
740000 28.69 3.23 3.07
760000 30.34 3.59 3.1
780000 30.27 3.64 3.13
800000 30.57 3.37 3.14
820000 31.41 3.57 3.06
840000 31.89 3.89 3.09
860000 33.77 3.57 3.05
880000 36.01 3.87 3.1
900000 37.58 4.04 3.07
920000 40.59 3.68 3.04
940000 40.9 3.72 3.09
960000 41.45 4.24 3.1
980000 42.05 4.28 3.13
1000000 41.11 4.09 3.03
1020000 42.29 4.63 3.11
1040000 42.09 4.81 3.17
1060000 42.8 4.34 3.14
1080000 43.82 4.43 3.02
1100000 43.01 4.5 3.12
1120000 44.08 4.84 3.1
1140000 45.4 5.08 3.16
1160000 47.51 4.66 3.13
1180000 47.46 5.02 3.17
1200000 48.21 5.42 3.17
1220000 49.09 5.33 3.21
1240000 49.65 5.12 3.1
1260000 50.55 5.3 3.11
1280000 50.56 5.8 3.17
1300000 52.35 5.7 3.17
1320000 53.67 5.43 3.14
1340000 52.38 5.79 3.26
1360000 51.44 6.03 3.18
1380000 52.69 6.16 3.08
1400000 53.88 5.67 3.04
1420000 52.61 5.86 3.2
1440000 53.61 5.92 3.01
1460000 52.98 5.98 3.08
1480000 54.74 6.27 3.18
1500000 55.4 6.81 3.18
1520000 58.45 6.54 3.12
1540000 57.14 6.65 3.12
1560000 58.92 6.97 3.17
1580000 58.17 6.86 3.22
1600000 58.29 6.83 3.15
1620000 58.19 6.6 3.18
1640000 58.77 7.3 3.16
1660000 60.59 7.44 3.22
1680000 61.12 7.14 3.18
1700000 61.25 7.41 3.33
1720000 62.2 7.18 3.21
1740000 61.48 7.95 3.02
1760000 62.63 7.91 3.17
1780000 62.58 7.52 3.19
1800000 62.28 7.56 3.14
1820000 64.2 7.62 3.33
1840000 64.18 8.6 3.17
1860000 64.34 8.71 3.33
1880000 63.93 8.44 3.22
1900000 62.98 8.13 3.3
1920000 63.9 8.45 3.29
1940000 64.85 8.36 3.21
1960000 68.26 10.75 4.81
1980000 67.81 9.46 3.17
---------- code ----------
proc runtest {sqlfile} {
set result {}
set delay 1000
exec sync; after $delay;
set t [time "exec sqlite3 def.db <$sqlfile" 1]
set t [expr {[lindex $t 0]/1000000.0}]
lappend result [format %.3f $t]
exec sync; after $delay;
set t [time "exec sqlite3 lc.db <$sqlfile" 1]
set t [expr {[lindex $t 0]/1000000.0}]
lappend result [format %.3f $t]
exec sync; after $delay;
set t [time "exec mysql -f drh <$sqlfile" 1]
set t [expr {[lindex $t 0]/1000000.0}]
lappend result [format %.3f $t]
return $result
}
# Initialize the environment
#
expr srand(1)
catch {exec /bin/sh -c {rm -f s*.db}}
set fd [open clear.sql w]
puts $fd {
drop table t1;
drop table t2;
}
close $fd
catch {exec mysql drh <clear.sql}
set fd [open 2kinit.sql w]
puts $fd {
PRAGMA default_cache_size=2000;
}
close $fd
exec sqlite3 def.db <2kinit.sql
set fd [open nosync-init.sql w]
puts $fd {
PRAGMA default_cache_size=20000;
}
close $fd
exec sqlite3 lc.db <nosync-init.sql
# initialize
set fd [open test.sql w]
puts $fd "BEGIN;"
puts $fd "CREATE TABLE t1(a INTEGER, b INTEGER);"
puts $fd "CREATE INDEX tq_b on t1(b);"
puts $fd "COMMIT;"
close $fd
runtest test.sql
set o [open results.csv w]
puts $o "tablesize\tsqlite\tsqlite-largecache\t mysql"
set step 20000
for {set num 0} {$num < 2000000} {incr num $step} {
puts $num
set fd [open test.sql w]
puts $fd "BEGIN;"
for {set i 1} {$i<=$step} {incr i} {
set r [expr {int(rand()*500000)}]
puts $fd "INSERT INTO t1 VALUES([expr {$num+$i}],$r);"
}
puts $fd "COMMIT;"
close $fd
puts $o $num\t[join [runtest test.sql] \t]
flush $o
}
close $o
--
Dr Peter De Rijk E-mail: [EMAIL PROTECTED]
Bioinformatics Unit Tel. +32 3 265 10 08
Department of Molecular Genetics VIB8 Fax. +32 3 265 10 12
University of Antwerp http://www.molgen.ua.ac.be/
Universiteitsplein 1
B-2610 Antwerpen, Belgium
The glass is not half full, nor half empty. The glass is just too big.
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------