[EMAIL PROTECTED] wrote: > I asked a very similar question a few weeks ago and got a very > precise answer. You should search for that.
Thanks for the pointer! It is good to learn that a new sorter is under development. But I think my problem is a little different than yours. I think I have run into what can be called a bug, but since I'm on 3.4.2 and a new sorter is planned anyway, I won't dig too deep into this. I have found a workaround, which is to increase cache_size. I'll discuss my experience briefly to help others. You might say that it's my fault if I specify an insufficient cache_size, but I think that (1) I shouldn't really need to specify this in a "zero configuration" database engine, and (2) the behaviour is exceptional. Even with a smaller than optimal cache_size, a sorting algorithm should be able to achieve something instead of grinding to a "dynamic halt". And if it needs to give up, it should announce this and return an error rather than keep trashing for hours. Since SQLite is advertised as a "zero configuration" database engine, the need to manually specify a cache_size to perform large sorts is a bit of a surprise. Other "zero configuration" applications such as Emacs, OpenOffice or Firefox don't require the user to specify the limits for memory allocation. The default cache_size (in SQLite 3.4.2) is indeed 2000. Increasing this above 1000000 (one million) solved my problem. It takes 12 minutes to build my index, and the resulting database file is roughly 1 GB larger than before. My CPU uses one of its two cores at 100% for 12 minutes, and then outputs the result quite fast. But if I set cache_size to X percent of one million, it takes X percent of 12 minutes before full CPU utilization turns into a near halt. Here's how time(1) reports the real (wallclock), user and system time, for different values of cache_size: cache_size real user sys 2000 for ever? - - 800000 for ever? - - 850000 28m52.573s 12m3.230s 0m14.580s 900000 21m1.038s 12m6.980s 0m12.710s 920000 17m47.512s 12m6.350s 0m11.770s 960000 13m58.901s 12m6.630s 0m9.810s 1000000 12m27.358s 12m0.960s 0m9.950s 1500000 12m32.865s 12m7.890s 0m9.890s 2000000 12m36.062s 12m9.070s 0m10.740s 5000000 14m7.099s 12m1.980s 0m12.950s The amount of user process CPU time required is constant at 12 minutes and a few seconds. The difference for smaller values of cache_size is the unproductive wallclock time spent in I/O wait. Below is the output from "vmstat 15", when the limit of cache_size (insufficiently set to 500000) is hit (after roughly 6 minutes). Before the hit, the system uses 50% of its time to run user processes (cpu:us) and is 50% idle (cpu:id), because only one of two CPU cores is used. Data is read from disks (io:bi) and almost nothing is written (io:bo). After the limit is hit (fifth line), disk writes start (io:bo) and CPU time is used in iowait (cpu:wa) while CPU time for user processes falls to near zero. No swapping occurs and memory usage is not affected. procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 30600 102596 4976 2737000 0 0 1986 2 34 83 50 0 50 0 1 0 30600 104240 4996 2735356 0 0 1974 2 37 87 50 0 50 0 1 0 30600 105280 5016 2734400 0 0 2034 2 36 85 50 0 50 0 1 0 30600 105016 5524 2734328 0 0 2035 3 38 90 50 0 50 0 0 4 30600 104068 5496 2735380 0 0 854 5505 659 170 21 2 40 37 0 2 30600 104572 5488 2735196 0 0 686 4421 545 183 16 2 19 63 0 3 30600 103844 5512 2729000 0 0 489 3012 472 165 12 1 24 63 1 4 30600 104960 5844 2728328 0 0 458 2598 460 672 29 4 22 46 0 3 30600 104788 5844 2733556 0 0 292 3531 492 149 11 1 46 42 1 2 30600 103788 5848 2734940 0 0 383 2241 376 155 9 1 33 57 0 6 30600 102488 5860 2736472 0 0 222 2605 430 135 6 1 15 79 1 3 30600 104104 5876 2734608 0 0 311 1962 427 174 8 1 9 82 0 4 30600 103084 5864 2736072 0 0 220 2233 394 149 5 1 13 81 With half of the CPU work remaining to be done, but only 5% (and falling) of the available CPU power devoted to this, the job will take very long. -- Lars Aronsson ([EMAIL PROTECTED]) Aronsson Datateknik - http://aronsson.se _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

