I'm trying to learn more about MySQL versus sqllite when it comes to vacuuming and fragmentation, especially as it relates to SSD storage. Can anybody summarize for me the differences?
1) My understanding is sqlite, in general, has no automatic defragmentation: there is no process to gradually and incrementally reposition data on disk in index order. (Even auto-VACUUM just reclaims deleted data by inserting new transactions in the holes left behind by previous deletes, which the documentation says can actually worsen fragmentation.) The only defragmentation option is a full VACUUM, which rebuilds the entire database from scratch in index order. During this period, the whole database is locked and cannot be used. Is this roughly right? 2) My understanding of MySQL is that the equivalent function is to OPTIMIZE TABLE. Unlike sqlite, which works on a full database, this works on a single table, but is otherwise the same -- rebuilds the entire table (and thus needs 2x the disk space), locks the table during the operation, is not incremental (either works fully or not at all). Is this about right? 3) I've heard mention that in some configurations (perhaps some database engines?) MySQL doesn't need vacuuming for some reason -- somehow it just doesn't fragment, or resolves fragmentation as it goes -- but I can't track down an explanation of why. Any ideas? 4) Does MySQL (or sqlite?) have any fancy defragmentation capabilities, such as (for example) using a clustered primary index where no two secondary rows are put into the same block? (For example, imagine a table containing historical credit card transactions, where transactions are grouped first by "cardID" and then each card has a sequential list of rows identified by "transactionID". Every night, new transactions are added to each card, meaning new data would ideally be inserted throughout the entire table, as opposed to just adding at the very end. I could imagine a clustered index ordering all transactions for a single card back-to-back on disk, except skipping to the next full database block before adding transactions for the next card. This would intentionally leave space free in the database for additional transactions to be added to each card in disk-sequential-order, explicitly avoiding transactions from multiple cards being stored in the same block. This wouldn't be a complete solution -- the database blocks for a particular card wouldn't necessarily be sequential -- but the transactions inside each block *would* be sequential, and no two blocks would contain transactions from different cards. Does this make sens? Does anything like this exist?) 4) My understanding is disk order matters a lot for spinning disks, as it reduces seek times when dealing with data in neighboring primary keys. However, am I correct in assuming that the necessity for this is dramatically reduced by SSDs, which allow random access? Granted, vacuuming should always help to a degree: if your rows are smaller than the block size, then it's better to have neighboring rows in the same block (so you needn't read multiple blocks to get the same data). And it's always good to ensure your database block size is the same as the disk block size (whether SSD or spinning) so you only need to read a single disk block per database block. But in a world with SSDs and a tremendous amount of RAM, does vacuuming matter nearly as much as on a spinning disk with constrained RAM? Thanks, I really appreciate your thoughts and links to reading material! -david _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users