If you pre-sort the data prior to inserting into sqlite you will
see much better timings for both the default cache size and
the larger cache size.

When I run your script (unmodified, except for the removal of 
MySQL) up to 400000:

tablesize       sqlite  sqlite-largecache
0       1.956   1.956
20000   2.064   2.079
40000   2.095   2.111
60000   2.173   2.173
80000   2.189   2.283
100000  2.345   2.298
120000  2.345   2.360
140000  2.407   2.361
160000  2.470   2.423
180000  2.548   2.501
200000  2.564   2.439
220000  2.830   2.439
240000  2.876   2.907
260000  2.829   2.564
280000  3.423   3.533
300000  4.939   3.564
320000  7.236   3.736
340000  7.283   3.751
360000  10.611  3.767
380000  11.142  3.845
400000  13.736  3.798

When I make the following change to your script to simulate an ordered data set
(okay, it's cheating slightly - no pre-sort need be performed here):

               # set r [expr {int(rand()*500000)}]
               set r [expr {($num+$i)*5}]

tablesize       sqlite  sqlite-largecache
0       1.878   1.894
20000   1.925   1.923
40000   1.923   1.923
60000   1.923   1.954
80000   1.970   1.923
100000  1.923   1.908
120000  1.923   1.970
140000  1.940   2.383
160000  2.048   1.908
180000  1.923   2.002
200000  1.923   1.923
220000  1.939   1.923
240000  1.923   1.923
260000  1.939   1.923
280000  1.939   1.939
300000  1.954   1.939
320000  1.939   1.923
340000  1.970   1.954
360000  1.939   1.954
380000  1.923   1.939
400000  1.970   1.939

I wonder if other databases pre-sort their batch inserts by 
index order to achieve more consistant insert speed with 
datasets that exceed cache size.


----- Original Message ----
From: Peter De Rijk <[EMAIL PROTECTED]>
To: sqlite-users@sqlite.org
Sent: Saturday, October 28, 2006 10:40:03 AM
Subject: Re: [sqlite] serious performance problems with indexes

I have checked more fine grained (results and code at the end of the  
mail), and it seems indeed that the extreme growth of time needed 
for inserts only starts at around 160000 rows (on my system). Before 
that, it also rises, but much much more slowly. As this indeed 
looked like a cache effect, I tried setting the default_cache_size 
to something big (20000). This was enough to remove the bend (at 
least till 2000000 rows), so that the increase in time kept going 
at the same slower pace as for the smaller data sizes.

Curiously, mysql which I included as a reference, manages to 
keep nearly the same insert time even for the largest data sets.
Using the large cache size, sqlite is faster than mysql until about 
700000 rows. At 2000000 rows mysql is about 3 times faster than sqlite
with large cache (vs more than 20 times faster than sqlite with 
default cache size).

------- 
tablesize    sqlite    sqlite-largecache     mysql
0    1.12    1.04    2.86
20000    1.09    1.17    2.89
40000    1.14    1.15    3.27
60000    1.23    1.2    3.18
80000    1.47    1.25    3.01
100000    1.3    1.32    2.94
120000    1.4    1.34    2.94
140000    2.06    1.38    3
160000    2.41    1.46    2.99
180000    3.09    1.54    3.03
200000    3.31    1.62    3.04
220000    3.84    1.72    3.06
240000    4.22    1.7    3.03
260000    4.23    1.76    3.06
280000    4.83    1.94    2.98
300000    6.27    2.08    3.07
320000    9.02    2.06    3.11
340000    9.48    1.94    2.88
360000    10    2.13    3.04
380000    10.6    2.21    3.13
400000    10.72    2.35    3.05
420000    11.35    2.24    3.11
440000    11.55    2.49    3.05
460000    11.75    2.5    3.07
480000    16.89    2.39    3.1
500000    17.81    2.59    3.09
520000    19.22    2.4    3.03
540000    19.35    2.77    3.05
560000    19.44    2.59    3.1
580000    19.87    2.67    3.08
600000    21.47    2.88    3.09
620000    20.82    2.74    3.08
640000    21.55    3    3.11
660000    25.17    3.09    3.12
680000    29.89    2.84    3.16
700000    28.48    3.25    3.09
720000    28.62    3.31    3.2
740000    28.69    3.23    3.07
760000    30.34    3.59    3.1
780000    30.27    3.64    3.13
800000    30.57    3.37    3.14
820000    31.41    3.57    3.06
840000    31.89    3.89    3.09
860000    33.77    3.57    3.05
880000    36.01    3.87    3.1
900000    37.58    4.04    3.07
920000    40.59    3.68    3.04
940000    40.9    3.72    3.09
960000    41.45    4.24    3.1
980000    42.05    4.28    3.13
1000000    41.11    4.09    3.03
1020000    42.29    4.63    3.11
1040000    42.09    4.81    3.17
1060000    42.8    4.34    3.14
1080000    43.82    4.43    3.02
1100000    43.01    4.5    3.12
1120000    44.08    4.84    3.1
1140000    45.4    5.08    3.16
1160000    47.51    4.66    3.13
1180000    47.46    5.02    3.17
1200000    48.21    5.42    3.17
1220000    49.09    5.33    3.21
1240000    49.65    5.12    3.1
1260000    50.55    5.3    3.11
1280000    50.56    5.8    3.17
1300000    52.35    5.7    3.17
1320000    53.67    5.43    3.14
1340000    52.38    5.79    3.26
1360000    51.44    6.03    3.18
1380000    52.69    6.16    3.08
1400000    53.88    5.67    3.04
1420000    52.61    5.86    3.2
1440000    53.61    5.92    3.01
1460000    52.98    5.98    3.08
1480000    54.74    6.27    3.18
1500000    55.4    6.81    3.18
1520000    58.45    6.54    3.12
1540000    57.14    6.65    3.12
1560000    58.92    6.97    3.17
1580000    58.17    6.86    3.22
1600000    58.29    6.83    3.15
1620000    58.19    6.6    3.18
1640000    58.77    7.3    3.16
1660000    60.59    7.44    3.22
1680000    61.12    7.14    3.18
1700000    61.25    7.41    3.33
1720000    62.2    7.18    3.21
1740000    61.48    7.95    3.02
1760000    62.63    7.91    3.17
1780000    62.58    7.52    3.19
1800000    62.28    7.56    3.14
1820000    64.2    7.62    3.33
1840000    64.18    8.6    3.17
1860000    64.34    8.71    3.33
1880000    63.93    8.44    3.22
1900000    62.98    8.13    3.3
1920000    63.9    8.45    3.29
1940000    64.85    8.36    3.21
1960000    68.26    10.75    4.81
1980000    67.81    9.46    3.17

---------- code ----------
proc runtest {sqlfile} {
  set result {}
  set delay 1000
  exec sync; after $delay;
  set t [time "exec sqlite3 def.db <$sqlfile" 1]
  set t [expr {[lindex $t 0]/1000000.0}]
  lappend result [format %.3f $t]
  exec sync; after $delay;
  set t [time "exec sqlite3 lc.db <$sqlfile" 1]
  set t [expr {[lindex $t 0]/1000000.0}]
  lappend result [format %.3f $t]
  exec sync; after $delay;
  set t [time "exec mysql -f drh <$sqlfile" 1]
  set t [expr {[lindex $t 0]/1000000.0}]
  lappend result [format %.3f $t]
  return $result
}

# Initialize the environment
#
expr srand(1)
catch {exec /bin/sh -c {rm -f s*.db}}
set fd [open clear.sql w]
puts $fd {
  drop table t1;
  drop table t2;
}
close $fd
catch {exec mysql drh <clear.sql}
set fd [open 2kinit.sql w]
puts $fd {
  PRAGMA default_cache_size=2000;
}
close $fd
exec sqlite3 def.db <2kinit.sql
set fd [open nosync-init.sql w]
puts $fd {
  PRAGMA default_cache_size=20000;
}
close $fd
exec sqlite3 lc.db <nosync-init.sql

# initialize
set fd [open test.sql w]
puts $fd "BEGIN;"
puts $fd "CREATE TABLE t1(a INTEGER, b INTEGER);"
puts $fd "CREATE INDEX tq_b on t1(b);"
puts $fd "COMMIT;"
close $fd
runtest test.sql

set o [open results.csv w]
puts $o "tablesize\tsqlite\tsqlite-largecache\t mysql"
set step 20000
for {set num 0} {$num < 2000000} {incr num $step} {
    puts $num
    set fd [open test.sql w]
    puts $fd "BEGIN;"
    for {set i 1} {$i<=$step} {incr i} {
        set r [expr {int(rand()*500000)}]
        puts $fd "INSERT INTO t1 VALUES([expr {$num+$i}],$r);"
    }
    puts $fd "COMMIT;"
    close $fd
    puts $o $num\t[join [runtest test.sql] \t]
    flush $o
}
close $o

-- 
Dr Peter De Rijk                         E-mail: [EMAIL PROTECTED]
Bioinformatics Unit                          Tel. +32 3 265 10 08
Department of Molecular Genetics VIB8        Fax. +32 3 265 10 12
University of Antwerp                http://www.molgen.ua.ac.be/
Universiteitsplein 1                 
B-2610 Antwerpen, Belgium

The glass is not half full, nor half empty. The glass is just too big.

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------







-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to