----------On Friday 27 October 2006 15:31, [EMAIL PROTECTED] wrote:
> When a table is indexed, INSERT performance is logorithmic in the
> number of rows in the table and linear in the number of indices.
> This is because entries have to be inserted into the index in
> sorted order (otherwise it wouldn't be an index).  And each
> insert thus requires a binary search.
>
> If your index becomes very large so that it no longer fits
> in your disk cache, then the binary search can involve a lot
> of disk I/O which can make things really slow.  The usual
> work-around here is to keep a much smaller staging table into
> which you do your inserts and then periodically merge the
> staging table into the main table.  This makes your queries
> more complex (and slower) since you are now having to
> search multiple tables.  But it does speed up inserts.
> --
> D. Richard Hipp  <[EMAIL PROTECTED]>
>

I have checked more fine grained (results and code at the end of the  
mail), and it seems indeed that the extreme growth of time needed 
for inserts only starts at around 160000 rows (on my system). Before 
that, it also rises, but much much more slowly. As this indeed 
looked like a cache effect, I tried setting the default_cache_size 
to something big (20000). This was enough to remove the bend (at 
least till 2000000 rows), so that the increase in time kept going 
at the same slower pace as for the smaller data sizes.

Curiously, mysql which I included as a reference, manages to 
keep nearly the same insert time even for the largest data sets.
Using the large cache size, sqlite is faster than mysql until about 
700000 rows. At 2000000 rows mysql is about 3 times faster than sqlite
with large cache (vs more than 20 times faster than sqlite with 
default cache size).

------- 
tablesize       sqlite  sqlite-largecache        mysql
0       1.12    1.04    2.86
20000   1.09    1.17    2.89
40000   1.14    1.15    3.27
60000   1.23    1.2     3.18
80000   1.47    1.25    3.01
100000  1.3     1.32    2.94
120000  1.4     1.34    2.94
140000  2.06    1.38    3
160000  2.41    1.46    2.99
180000  3.09    1.54    3.03
200000  3.31    1.62    3.04
220000  3.84    1.72    3.06
240000  4.22    1.7     3.03
260000  4.23    1.76    3.06
280000  4.83    1.94    2.98
300000  6.27    2.08    3.07
320000  9.02    2.06    3.11
340000  9.48    1.94    2.88
360000  10      2.13    3.04
380000  10.6    2.21    3.13
400000  10.72   2.35    3.05
420000  11.35   2.24    3.11
440000  11.55   2.49    3.05
460000  11.75   2.5     3.07
480000  16.89   2.39    3.1
500000  17.81   2.59    3.09
520000  19.22   2.4     3.03
540000  19.35   2.77    3.05
560000  19.44   2.59    3.1
580000  19.87   2.67    3.08
600000  21.47   2.88    3.09
620000  20.82   2.74    3.08
640000  21.55   3       3.11
660000  25.17   3.09    3.12
680000  29.89   2.84    3.16
700000  28.48   3.25    3.09
720000  28.62   3.31    3.2
740000  28.69   3.23    3.07
760000  30.34   3.59    3.1
780000  30.27   3.64    3.13
800000  30.57   3.37    3.14
820000  31.41   3.57    3.06
840000  31.89   3.89    3.09
860000  33.77   3.57    3.05
880000  36.01   3.87    3.1
900000  37.58   4.04    3.07
920000  40.59   3.68    3.04
940000  40.9    3.72    3.09
960000  41.45   4.24    3.1
980000  42.05   4.28    3.13
1000000 41.11   4.09    3.03
1020000 42.29   4.63    3.11
1040000 42.09   4.81    3.17
1060000 42.8    4.34    3.14
1080000 43.82   4.43    3.02
1100000 43.01   4.5     3.12
1120000 44.08   4.84    3.1
1140000 45.4    5.08    3.16
1160000 47.51   4.66    3.13
1180000 47.46   5.02    3.17
1200000 48.21   5.42    3.17
1220000 49.09   5.33    3.21
1240000 49.65   5.12    3.1
1260000 50.55   5.3     3.11
1280000 50.56   5.8     3.17
1300000 52.35   5.7     3.17
1320000 53.67   5.43    3.14
1340000 52.38   5.79    3.26
1360000 51.44   6.03    3.18
1380000 52.69   6.16    3.08
1400000 53.88   5.67    3.04
1420000 52.61   5.86    3.2
1440000 53.61   5.92    3.01
1460000 52.98   5.98    3.08
1480000 54.74   6.27    3.18
1500000 55.4    6.81    3.18
1520000 58.45   6.54    3.12
1540000 57.14   6.65    3.12
1560000 58.92   6.97    3.17
1580000 58.17   6.86    3.22
1600000 58.29   6.83    3.15
1620000 58.19   6.6     3.18
1640000 58.77   7.3     3.16
1660000 60.59   7.44    3.22
1680000 61.12   7.14    3.18
1700000 61.25   7.41    3.33
1720000 62.2    7.18    3.21
1740000 61.48   7.95    3.02
1760000 62.63   7.91    3.17
1780000 62.58   7.52    3.19
1800000 62.28   7.56    3.14
1820000 64.2    7.62    3.33
1840000 64.18   8.6     3.17
1860000 64.34   8.71    3.33
1880000 63.93   8.44    3.22
1900000 62.98   8.13    3.3
1920000 63.9    8.45    3.29
1940000 64.85   8.36    3.21
1960000 68.26   10.75   4.81
1980000 67.81   9.46    3.17

---------- code ----------
proc runtest {sqlfile} {
  set result {}
  set delay 1000
  exec sync; after $delay;
  set t [time "exec sqlite3 def.db <$sqlfile" 1]
  set t [expr {[lindex $t 0]/1000000.0}]
  lappend result [format %.3f $t]
  exec sync; after $delay;
  set t [time "exec sqlite3 lc.db <$sqlfile" 1]
  set t [expr {[lindex $t 0]/1000000.0}]
  lappend result [format %.3f $t]
  exec sync; after $delay;
  set t [time "exec mysql -f drh <$sqlfile" 1]
  set t [expr {[lindex $t 0]/1000000.0}]
  lappend result [format %.3f $t]
  return $result
}

# Initialize the environment
#
expr srand(1)
catch {exec /bin/sh -c {rm -f s*.db}}
set fd [open clear.sql w]
puts $fd {
  drop table t1;
  drop table t2;
}
close $fd
catch {exec mysql drh <clear.sql}
set fd [open 2kinit.sql w]
puts $fd {
  PRAGMA default_cache_size=2000;
}
close $fd
exec sqlite3 def.db <2kinit.sql
set fd [open nosync-init.sql w]
puts $fd {
  PRAGMA default_cache_size=20000;
}
close $fd
exec sqlite3 lc.db <nosync-init.sql

# initialize
set fd [open test.sql w]
puts $fd "BEGIN;"
puts $fd "CREATE TABLE t1(a INTEGER, b INTEGER);"
puts $fd "CREATE INDEX tq_b on t1(b);"
puts $fd "COMMIT;"
close $fd
runtest test.sql

set o [open results.csv w]
puts $o "tablesize\tsqlite\tsqlite-largecache\t mysql"
set step 20000
for {set num 0} {$num < 2000000} {incr num $step} {
        puts $num
        set fd [open test.sql w]
        puts $fd "BEGIN;"
        for {set i 1} {$i<=$step} {incr i} {
                set r [expr {int(rand()*500000)}]
                puts $fd "INSERT INTO t1 VALUES([expr {$num+$i}],$r);"
        }
        puts $fd "COMMIT;"
        close $fd
        puts $o $num\t[join [runtest test.sql] \t]
        flush $o
}
close $o

-- 
Dr Peter De Rijk                         E-mail: [EMAIL PROTECTED]
Bioinformatics Unit                          Tel. +32 3 265 10 08
Department of Molecular Genetics VIB8        Fax. +32 3 265 10 12
University of Antwerp                http://www.molgen.ua.ac.be/
Universiteitsplein 1                 
B-2610 Antwerpen, Belgium

The glass is not half full, nor half empty. The glass is just too big.

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to