On 7 Jul 2010, at 4:14pm, Jay A. Kreibich wrote: >> (I guess it well might not on an SSD disk, but on a conventional >> rotational disk, pager could read several pages ahead with one seek - >> but does it?) > > No, the pager does not. Among other things, my feeling is that the > locality of pages is not very strong, unless the database was just > VACUUMed.
Actually the SSD possibility makes it worse, not better. A simplified explanation follows. SSD units use separate sets of circuits for each memory chip. Suppose for the sake of argument that an 80Gig SSD drive contains 10 8Gig memory chips, circuitry around each one, and master circuits to handle the hard drive interface. And suppose you want to write two blocks to this drive and you do it in two different 'fwrite' commands. The first command happens to be for a block which is stored in chip number 3 of 10. The drive accepts the write command, routes the data to chip 3, then tells the OS that it's ready for another command. Suppose that the second command also refers to data stored in chip 3. Well the circuitry around chip 3 is still busy handling the first command. The drive has to wait for that circuitry to become free before it can send the second command to it. On the other hand, suppose that the second write concerns data held in chip 5 instead. Well that can be executed immediately: chip 5 isn't busy. "Well," you ask, "if that's how it works, why don't SSD drives stripe the data so that contiguous blocks are spread over the different chips ?" Well, if a single command is for data that's spread over different blocks, it's faster to send one instruction to one memory chip and handle one long chunk of data then to have to talk to several different chips and assemble the data into one chunk before returning it. So it's swings and roundabouts. So for SSD drives, or any system where a 'drive' is made up of separate storage items, locality can make things worse, not better. Add to that the fact that fragmentation is only really a problem for Windows these days. Windows filesystem drivers rely heavily on read-ahead logic for speed, and both NTFS and WIndows drivers were designed with that in mind. Having been designed with this in mind they operate pretty well and give pretty fast results. On the other hand, Macs and other forms of Unix and Linux systems are hell on fragmentation because all forms of Unix run background daemons which are constantly reading and writing to disk log files and journals irrespective of what your 'main' application is doing. Drivers and file systems designed for systems like that were built with different assumptions and don't degrade much with fragmentation. Not to mention that a few hours after you've run a defragmentation program you'll find you're fragmented again because of all the background stuff that's going on. So the urge towards defragmentation is a typical example of someone who knows a bit about how computers work -- "Fragmentation bad ! Caveman not like !" -- will spend hours on optimization that doesn't save much of anything. And fragmentation is not one unusual example, many things which naively look like they should be optimization actually aren't. So write simple clean code first. Only if you've decided it's not good enough is it worth improving it. And when you write a newer version which is meant to be better, compare the too to check that it really is. Simon. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users