I'm trying to learn more about MySQL versus sqllite when it comes to
vacuuming and fragmentation, especially as it relates to SSD storage.  Can
anybody summarize for me the differences?

1) My understanding is sqlite, in general, has no automatic
defragmentation: there is no process to gradually and incrementally
reposition data on disk in index order.  (Even auto-VACUUM just reclaims
deleted data by inserting new transactions in the holes left behind by
previous deletes, which the documentation says can actually worsen
fragmentation.)  The only defragmentation option is a full VACUUM, which
rebuilds the entire database from scratch in index order.  During this
period, the whole database is locked and cannot be used.  Is this roughly
right?

2) My understanding of MySQL is that the equivalent function is to OPTIMIZE
TABLE.  Unlike sqlite, which works on a full database, this works on a
single table, but is otherwise the same -- rebuilds the entire table (and
thus needs 2x the disk space), locks the table during the operation, is not
incremental (either works fully or not at all).  Is this about right?

3) I've heard mention that in some configurations (perhaps some database
engines?) MySQL doesn't need vacuuming for some reason -- somehow it just
doesn't fragment, or resolves fragmentation as it goes -- but I can't track
down an explanation of why.  Any ideas?

4) Does MySQL (or sqlite?) have any fancy defragmentation capabilities,
such as (for example) using a clustered primary index where no two
secondary rows are put into the same block?  (For example, imagine a table
containing historical credit card transactions, where transactions are
grouped first by "cardID" and then each card has a sequential list of rows
identified by "transactionID".  Every night, new transactions are added to
each card, meaning new data would ideally be inserted throughout the entire
table, as opposed to just adding at the very end. I could imagine a
clustered index ordering all transactions for a single card back-to-back on
disk, except skipping to the next full database block before adding
transactions for the next card.  This would intentionally leave space free
in the database for additional transactions to be added to each card in
disk-sequential-order, explicitly avoiding transactions from multiple cards
being stored in the same block.  This wouldn't be a complete solution --
the database blocks for a particular card wouldn't necessarily be
sequential -- but the transactions inside each block *would* be sequential,
and no two blocks would contain transactions from different cards.  Does
this make sens?  Does anything like this exist?)

4) My understanding is disk order matters a lot for spinning disks, as it
reduces seek times when dealing with data in neighboring primary keys.
 However, am I correct in assuming that the necessity for this is
dramatically reduced by SSDs, which allow random access?  Granted,
vacuuming should always help to a degree: if your rows are smaller than the
block size, then it's better to have neighboring rows in the same block (so
you needn't read multiple blocks to get the same data).  And it's always
good to ensure your database block size is the same as the disk block size
(whether SSD or spinning) so you only need to read a single disk block per
database block.  But in a world with SSDs and a tremendous amount of RAM,
does vacuuming matter nearly as much as on a spinning disk with constrained
RAM?

Thanks, I really appreciate your thoughts and links to reading material!

-david
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to