On Sunday 29 October 2006 18:47, Joe Wilson wrote:
> The last test that simulated pre-sorting 2 million rows
> in index order prior to insert may show optimal insert speed
> since it's only appending pages to the table and the index,
> but it may not be realistic, since you may not have sufficient
> memory to pre-sort all data in memory prior to insert.
> (Although if you have the memory to spare, this is the way to go).
Presorting is unfortunately not always possible, e.g. when
there are several indexes (with different order). But increasing
the cache seems a good solution for this anyway (and is less
cumbersome than presorting or staging). Your ideas set me thinking
on why even with the larger cache, we still do worse than mysql on large
datasets, and did some further experimenting (code at the end of the mail,
all tests with larger cache size).
I guess that it is not only the number of entries that is
important, but that there are at least two different factors
affecting the performance: the number of different values actually present
in the index, and the number of rows that have one value
The number of values has an important influence. If there is
only a limited number of possible values, performance stays nearly
identical regardless of the data size (and thus faster than mysql).
For larger numbers of possible values behaviour gets a bit
unexpected (to me at least): the time needed for random inserts rises
about linear with the data size, until it reaches a certain point,
after which the insert time remains the same regardless of dataset
size. The height of this plateau seems to related to the number of
possible values in a linear fashion.
Any ideas?
---------- code ----------
proc runtest {sqlfile} {
set result {}
set delay 1000
exec sync; after $delay;
set t [time "exec sqlite3 lc.db <$sqlfile" 1]
set t [expr {[lindex $t 0]/1000000.0}]
lappend result [format %.3f $t]
return $result
}
set numvlist {2 10 100 200 300 400 500 600 700 800 1000 2000 3000 5000 7000
10000}
# Initialize the environment
#
expr srand(1)
catch {exec /bin/sh -c {rm -f *.db}}
set fd [open clear.sql w]
foreach numv $numvlist {
puts $fd "drop table t$numv;"
}
close $fd
catch {exec mysql drh <clear.sql}
set fd [open 2kinit.sql w]
puts $fd {
PRAGMA default_cache_size=2000;
}
close $fd
exec sqlite3 def.db <2kinit.sql
set fd [open nosync-init.sql w]
puts $fd {
PRAGMA default_cache_size=20000;
}
close $fd
exec sqlite3 lc.db <nosync-init.sql
# initialize
set fd [open test.sql w]
foreach numv $numvlist {
puts $fd "BEGIN;"
puts $fd "CREATE TABLE t${numv}(a INTEGER, b INTEGER);"
puts $fd "CREATE INDEX tq_b${numv} on t${numv}(b);"
puts $fd "COMMIT;"
}
close $fd
runtest test.sql
set o [open results.csv w]
set temp "tablesize"
foreach numv $numvlist {
# append temp \tsqlite$numv\tsqlite-lc$numv\t mysql$numv
append temp \t$numv
}
puts $o $temp
set step 20000
foreach numv $numvlist {
set num 0
puts "tablesize: $num numv: $numv"
set fd [open test.sql w]
puts $fd "BEGIN;"
set r 0
for {set i 1} {$i<=$step} {incr i} {
puts $fd "INSERT INTO t$numv VALUES([expr {$num+$i}],$r);"
incr r
if {$r == $numv} {set r 0}
}
puts $fd "COMMIT;"
close $fd
eval lappend resultline [runtest test.sql]
}
puts $o $num\t[join $resultline \t]
flush $o
puts start
for {} {$num < 2000000} {incr num $step} {
set resultline {}
foreach numv $numvlist {
puts "tablesize: $num numv: $numv"
set fd [open test.sql w]
puts $fd "BEGIN;"
for {set i 1} {$i<=$step} {incr i} {
set r [expr {int(rand()*$numv)}]
puts $fd "INSERT INTO t$numv VALUES([expr
{$num+$i}],$r);"
}
puts $fd "COMMIT;"
close $fd
eval lappend resultline [runtest test.sql]
}
puts $o $num\t[join $resultline \t]
flush $o
}
close $o
--
Dr Peter De Rijk E-mail: [EMAIL PROTECTED]
Bioinformatics Unit Tel. +32 3 265 10 08
Department of Molecular Genetics VIB8 Fax. +32 3 265 10 12
University of Antwerp http://www.molgen.ua.ac.be/
Universiteitsplein 1
B-2610 Antwerpen, Belgium
The glass is not half full, nor half empty. The glass is just too big.
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------