The last test that simulated pre-sorting 2 million rows in index order prior to insert may show optimal insert speed since it's only appending pages to the table and the index, but it may not be realistic, since you may not have sufficient memory to pre-sort all data in memory prior to insert. (Although if you have the memory to spare, this is the way to go).
Here's another test on the same machine that uses random inputs but inserts batches of 20000 pre-sorted rows at time. You can see that insert time increases with the size of the table, but it is not exponential as was the case with the original test program. You can also see that having a large sqlite cache does not significantly effect the insert timings under 2M rows as compared to the default sqlite cache. Tcl code change follows the data. tablesize sqlite sqlite-largecache 0 1.909 1.909 20000 2.019 2.048 40000 2.064 2.079 60000 2.111 2.079 80000 2.142 2.158 100000 2.157 2.157 120000 2.282 2.235 140000 2.298 2.267 160000 2.392 2.298 180000 2.439 2.314 200000 2.361 2.345 220000 2.501 2.439 240000 2.689 3.439 260000 2.704 3.423 280000 2.673 3.564 300000 2.767 3.533 320000 2.829 3.564 340000 2.876 2.657 360000 2.955 2.626 380000 3.017 3.673 400000 3.001 3.704 420000 3.048 2.767 440000 2.970 3.783 460000 3.048 2.845 480000 3.095 3.861 500000 3.157 3.861 520000 3.173 3.923 540000 3.142 3.970 560000 3.236 3.986 580000 3.220 4.002 600000 3.408 4.970 620000 3.267 4.142 640000 3.423 5.017 660000 3.454 4.189 680000 3.611 4.220 700000 3.611 5.142 720000 3.704 5.251 740000 3.829 4.330 760000 3.657 4.330 780000 3.970 4.455 800000 4.001 4.470 820000 4.046 4.454 840000 4.173 4.470 860000 4.142 4.611 880000 4.126 4.548 900000 4.298 4.501 920000 4.517 4.642 940000 4.641 4.689 960000 4.517 5.579 980000 4.673 4.736 1000000 4.736 4.751 1020000 4.689 4.845 1040000 4.923 4.798 1060000 5.111 4.939 1080000 5.204 4.314 1100000 5.048 4.876 1120000 5.064 5.814 1140000 5.205 5.876 1160000 5.314 5.876 1180000 5.501 5.142 1200000 6.517 5.032 1220000 5.736 5.048 1240000 5.861 6.986 1260000 5.892 5.986 1280000 5.767 5.298 1300000 6.095 6.095 1320000 6.080 5.329 1340000 5.954 5.283 1360000 6.548 5.986 1380000 6.157 5.423 1400000 6.642 6.330 1420000 6.439 5.643 1440000 6.705 5.579 1460000 6.626 5.611 1480000 6.814 6.517 1500000 6.720 6.533 1520000 7.111 6.579 1540000 7.142 5.845 1560000 7.533 6.173 1580000 7.345 6.064 1600000 7.689 6.142 1620000 7.798 6.283 1640000 7.876 6.486 1660000 7.704 6.408 1680000 8.283 6.470 1700000 8.455 6.517 1720000 8.126 8.049 1740000 8.314 7.720 1760000 8.564 7.017 1780000 8.845 7.095 1800000 8.814 7.079 1820000 9.126 7.002 1840000 8.798 7.174 1860000 9.189 7.236 1880000 9.111 7.184 1900000 9.986 11.830 1920000 9.658 7.673 1940000 9.876 9.126 1960000 9.892 7.908 1980000 9.626 7.580 set o [open results.csv w] puts $o "tablesize\tsqlite\tsqlite-largecache\t mysql" set step 20000 for {set num 0} {$num < 2000000} {incr num $step} { puts $num set fd [open test.sql w] set rows [list] for {set i 1} {$i<=$step} {incr i} { lappend rows [list [expr {$num+$i}] [expr {int(rand()*500000)}]] } puts $fd "BEGIN;" foreach i [lsort -integer -index 1 $rows] { puts $fd "INSERT INTO t1 VALUES([lindex $i 0], [lindex $i 1]);" } puts $fd "COMMIT;" close $fd puts $o $num\t[join [runtest test.sql] \t] flush $o } close $o ----- Original Message ---- From: Joe Wilson <[EMAIL PROTECTED]> To: sqlite-users@sqlite.org Sent: Saturday, October 28, 2006 1:08:20 PM Subject: Re: [sqlite] serious performance problems with indexes If you pre-sort the data prior to inserting into sqlite you will see much better timings for both the default cache size and the larger cache size. When I run your script (unmodified, except for the removal of MySQL) up to 400000: tablesize sqlite sqlite-largecache 0 1.956 1.956 20000 2.064 2.079 40000 2.095 2.111 60000 2.173 2.173 80000 2.189 2.283 100000 2.345 2.298 120000 2.345 2.360 140000 2.407 2.361 160000 2.470 2.423 180000 2.548 2.501 200000 2.564 2.439 220000 2.830 2.439 240000 2.876 2.907 260000 2.829 2.564 280000 3.423 3.533 300000 4.939 3.564 320000 7.236 3.736 340000 7.283 3.751 360000 10.611 3.767 380000 11.142 3.845 400000 13.736 3.798 When I make the following change to your script to simulate an ordered data set (okay, it's cheating slightly - no pre-sort need be performed here): # set r [expr {int(rand()*500000)}] set r [expr {($num+$i)*5}] tablesize sqlite sqlite-largecache 0 1.878 1.894 20000 1.925 1.923 40000 1.923 1.923 60000 1.923 1.954 80000 1.970 1.923 100000 1.923 1.908 120000 1.923 1.970 140000 1.940 2.383 160000 2.048 1.908 180000 1.923 2.002 200000 1.923 1.923 220000 1.939 1.923 240000 1.923 1.923 260000 1.939 1.923 280000 1.939 1.939 300000 1.954 1.939 320000 1.939 1.923 340000 1.970 1.954 360000 1.939 1.954 380000 1.923 1.939 400000 1.970 1.939 ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------