On 27 Oct 2012, at 11:38am, David Barrett <dbarr...@expensify.com> wrote:

> I'm trying to learn more about MySQL versus sqllite when it comes to
> vacuuming and fragmentation, especially as it relates to SSD storage.


Rather than answer your questions point-by-point, I'm going to give you the 
current state of play.  Your understanding of how the various DBMSes work is 
right, but your excellent question "in a world with SSDs and a tremendous 
amount of RAM, does vacuuming matter nearly as much as on a spinning disk with 
constrained RAM?" cuts to the heart of all your other points.  The following 
involves a little simplification and handwaving because otherwise it would be 
two chapters long and you'd have to do homework.

Fragmentation made a big difference to apps running on Windows, but very little 
on any other platform.  This is because Windows does something called 
'read-ahead caching' which assumes that if you read block B, you're soon going 
to want to read block B+1, so at quite a low level it helpfully pre-reads it 
for you.  Other operating systems don't make this assumption.  This is why 
Windows users talk about defragmentation so much, but Unix users don't care 
about it.

SSDs negate the whole point of defragmentation.  On a rotational disk it's 
faster to read blocks B, B+1, B+2, B+4, B+5 than it is to read five random 
blocks from the disk, because the read heads are already positioned in the 
right place, and the disk is going to rotate to show those five blocks in 
order.  SSDs are just like RAM: they're Random Access Memory.  Reading any five 
blocks in any order takes roughly the same time.  So nobody cares about 
fragmentation on an SSD.  Read whatever blocks you want in whatever order you 
want.

As to clever management of disk block alignment with respect to rows and 
columns, this is rarely worth attention these days.  The amount of programming 
and debugging time it takes to get this right, and the amount of extra 
processing and disk access you need to do, give you less return on investment 
than if you spent the same money on buying a faster hard disk.  It's "premature 
optimization" (look up the term) except for two cases: overnight runs and 
realtime 3D graphics.  If your overnight run takes more than one night, you 
have a problem.  If you're programming realtime 3D graphics and they're jerky, 
your users won't enjoy your simulation.  But you wouldn't be using a SQL engine 
for 3D graphics anyway.

The matters you mentioned were all worth attention back in the 1980s when 
storage and bandwidth were expensive.  As you pointed out near the end of your 
post, these things matter less now.

Simon.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to