On Wed, Jul 07, 2010 at 01:03:26PM +0400, Igor Sereda scratched on the wall: > > It's never time to VACUUM a database. > > This is an interesting statement. In our application, all tables get > heavily fragmented with time (99% or more). I was considering VACUUM > as a means to defragment, and, presumably, improve search performance. > Was I misguided, and search performance does not degrade significantly > with the increased fragmentation of the DB?
No, you're not misguided, but the full picture is deeper. The VACUUM will repack the pages, putting more user-data in a smaller footprint. This improves I/O performance since you pull in more usable data with a given number of reads (at least, until things are mixed up again). This will typically only show up when doing large scans of tables that have been really thrashed, however (rows added and deleted and added and deleted over and over and over). VACUUM also puts all the pages of a given database object in the same spot in the file. That will help with reads, but only if the low level block assignments on the disk are continuous. Just because the blocks are next to each other in a file doesn't mean they're next to each other on the disk. In theory, the VACUUM also provides the filesystem an opportunity to defragment the file itself, but it is less clear how much this actually happens. VACUUM rebuilds the database into a temp file and then copies it back, but the copy is done as a low level page-by-page overwrite-- it does not actually delete the old file and then just copy or rename the rebuilt one. This makes the whole operation transaction safe (complete with roll back ability right up to the final moment of success), but it means the original database file stays more or less intact on the filesystem, and is simply over-written one section at a time. Some filesystems will take the chance to merge the file blocks, some will not. If you really want the best performance, your best bet is to copy the file after it has been VACUUMed. Or defragment the filesystem. Or use an auto-defragmenting filesystem. > (I guess it well might not on an SSD disk, but on a conventional > rotational disk, pager could read several pages ahead with one seek - > but does it?) No, the pager does not. Among other things, my feeling is that the locality of pages is not very strong, unless the database was just VACUUMed. Memory for the page cache also tends to be very guarded. On a modern desktop system, it is likely the filesystem and/or the disk controller will do some level of read-ahead, however. Any benefit will likely require contiguous physical layout. Overall, I agree with most others... if you're just starting application development, this should be way way down on your list of concerns. It is likely this concern can even drop off the list, unless you know your files are significant in size, the database lifespan is significant, and the database will be subjected to an extremely high amount of flux and row thrashing. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

