The last test that simulated pre-sorting 2 million rows 
in index order prior to insert may show optimal insert speed
since it's only appending pages to the table and the index, 
but it may not be realistic, since you may not have sufficient 
memory to pre-sort all data in memory prior to insert.
(Although if you have the memory to spare, this is the way to go).

Here's another test on the same machine that uses 
random inputs but inserts batches of 20000 pre-sorted 
rows at time. You can see that insert time increases with 
the size of the table, but it is not exponential as was the
case with the original test program.

You can also see that having a large sqlite cache does
not significantly effect the insert timings under 2M rows 
as compared to the default sqlite cache.

Tcl code change follows the data.

tablesize       sqlite  sqlite-largecache
0       1.909   1.909
20000   2.019   2.048
40000   2.064   2.079
60000   2.111   2.079
80000   2.142   2.158
100000  2.157   2.157
120000  2.282   2.235
140000  2.298   2.267
160000  2.392   2.298
180000  2.439   2.314
200000  2.361   2.345
220000  2.501   2.439
240000  2.689   3.439
260000  2.704   3.423
280000  2.673   3.564
300000  2.767   3.533
320000  2.829   3.564
340000  2.876   2.657
360000  2.955   2.626
380000  3.017   3.673
400000  3.001   3.704
420000  3.048   2.767
440000  2.970   3.783
460000  3.048   2.845
480000  3.095   3.861
500000  3.157   3.861
520000  3.173   3.923
540000  3.142   3.970
560000  3.236   3.986
580000  3.220   4.002
600000  3.408   4.970
620000  3.267   4.142
640000  3.423   5.017
660000  3.454   4.189
680000  3.611   4.220
700000  3.611   5.142
720000  3.704   5.251
740000  3.829   4.330
760000  3.657   4.330
780000  3.970   4.455
800000  4.001   4.470
820000  4.046   4.454
840000  4.173   4.470
860000  4.142   4.611
880000  4.126   4.548
900000  4.298   4.501
920000  4.517   4.642
940000  4.641   4.689
960000  4.517   5.579
980000  4.673   4.736
1000000 4.736   4.751
1020000 4.689   4.845
1040000 4.923   4.798
1060000 5.111   4.939
1080000 5.204   4.314
1100000 5.048   4.876
1120000 5.064   5.814
1140000 5.205   5.876
1160000 5.314   5.876
1180000 5.501   5.142
1200000 6.517   5.032
1220000 5.736   5.048
1240000 5.861   6.986
1260000 5.892   5.986
1280000 5.767   5.298
1300000 6.095   6.095
1320000 6.080   5.329
1340000 5.954   5.283
1360000 6.548   5.986
1380000 6.157   5.423
1400000 6.642   6.330
1420000 6.439   5.643
1440000 6.705   5.579
1460000 6.626   5.611
1480000 6.814   6.517
1500000 6.720   6.533
1520000 7.111   6.579
1540000 7.142   5.845
1560000 7.533   6.173
1580000 7.345   6.064
1600000 7.689   6.142
1620000 7.798   6.283
1640000 7.876   6.486
1660000 7.704   6.408
1680000 8.283   6.470
1700000 8.455   6.517
1720000 8.126   8.049
1740000 8.314   7.720
1760000 8.564   7.017
1780000 8.845   7.095
1800000 8.814   7.079
1820000 9.126   7.002
1840000 8.798   7.174
1860000 9.189   7.236
1880000 9.111   7.184
1900000 9.986   11.830
1920000 9.658   7.673
1940000 9.876   9.126
1960000 9.892   7.908
1980000 9.626   7.580

set o [open results.csv w]
puts $o "tablesize\tsqlite\tsqlite-largecache\t mysql"
set step 20000
for {set num 0} {$num < 2000000} {incr num $step} {
  puts $num
  set fd [open test.sql w]
  set rows [list]
  for {set i 1} {$i<=$step} {incr i} {
    lappend rows [list [expr {$num+$i}] [expr {int(rand()*500000)}]]
  }
  puts $fd "BEGIN;"
  foreach i [lsort -integer -index 1 $rows] {
    puts $fd "INSERT INTO t1 VALUES([lindex $i 0], [lindex $i 1]);"
  }
  puts $fd "COMMIT;"
  close $fd
  puts $o $num\t[join [runtest test.sql] \t]
  flush $o
}
close $o


----- Original Message ----
From: Joe Wilson <[EMAIL PROTECTED]>
To: sqlite-users@sqlite.org
Sent: Saturday, October 28, 2006 1:08:20 PM
Subject: Re: [sqlite] serious performance problems with indexes

If you pre-sort the data prior to inserting into sqlite you will
see much better timings for both the default cache size and
the larger cache size.

When I run your script (unmodified, except for the removal of 
MySQL) up to 400000:

tablesize       sqlite  sqlite-largecache
0       1.956   1.956
20000   2.064   2.079
40000   2.095   2.111
60000   2.173   2.173
80000   2.189   2.283
100000  2.345   2.298
120000  2.345   2.360
140000  2.407   2.361
160000  2.470   2.423
180000  2.548   2.501
200000  2.564   2.439
220000  2.830   2.439
240000  2.876   2.907
260000  2.829   2.564
280000  3.423   3.533
300000  4.939   3.564
320000  7.236   3.736
340000  7.283   3.751
360000  10.611  3.767
380000  11.142  3.845
400000  13.736  3.798

When I make the following change to your script to simulate an ordered data set
(okay, it's cheating slightly - no pre-sort need be performed here):

               # set r [expr {int(rand()*500000)}]
               set r [expr {($num+$i)*5}]

tablesize       sqlite  sqlite-largecache
0       1.878   1.894
20000   1.925   1.923
40000   1.923   1.923
60000   1.923   1.954
80000   1.970   1.923
100000  1.923   1.908
120000  1.923   1.970
140000  1.940   2.383
160000  2.048   1.908
180000  1.923   2.002
200000  1.923   1.923
220000  1.939   1.923
240000  1.923   1.923
260000  1.939   1.923
280000  1.939   1.939
300000  1.954   1.939
320000  1.939   1.923
340000  1.970   1.954
360000  1.939   1.954
380000  1.923   1.939
400000  1.970   1.939








-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to