Wow, I didn't realize this was such a controversial question. I'm a huge sqlite fan. Expensify is built on sqlite. We have a 40GB database, replicated using our custom distributed transaction layer across 5 severs in three different datacenters.[1] It's been powering all of Expensify (including our direct deposit reimbursement engine and credit card import layer -- both of which contain incredibly sensitive information, with mistakes causing millions of dollars to move in the wrong direction). On the back of sqlite, we've grown to over million users, processing millions of dollars in expense reports every day.
However, we're starting to see problems. There is so much activity on some servers that there is never a chance for our checkpointing thread to do its thing, so our WAL file often ballons up to 30GB or more. This makes query times plummet. We regularly checkpoint manually, and often vacuum, all in an effort to keep queries moving quick. We also do things to trick out our indexes in order to ensure proper disk ordering, pay particular attention to block and cache amounts, etc. This isn't premature optimization for the sake of having fun, these are in response to real performance problems affecting our product. In light of that, there is a contingent pushing to drop sqlite in favor of MySQL. There are a wide range of reasons -- it has its own replication, better write concurrency, clustered indexes, and better edge-case data integrity (because we use 2 DBs and WAL, ATTACH doesn't provide atomic commit advantages). And for each I have a corresponding answer -- MySQL's replication isn't as good as ours, concurrency doesn't matter because we serialize writes and have a single threaded server anyway, clustered indexes would be nice but we can get close enough with custom ROWIDs, and the extremely rare situation where there's a cross-database integrity problem, we can detect and recover from any of the other slaves. And I also add in that sqlite can never crash because it's built into the server; its performance is fantastic because it runs in the same process; in years of operation we've never once seen it corrupt data; it's so easy to use; etc etc. But there's an argument I've heard come up to which I don't have a response: MySQL handles fragmentation better, and by extension would give us better performance on the same hardware. I'd like to know more about it, which is why I've asked. Thanks! -david [1] http://permalink.gmane.org/gmane.comp.db.sqlite.general/71868 On Sun, Oct 28, 2012 at 2:12 PM, Pavel Ivanov <paiva...@gmail.com> wrote: > OK. Curiosity is a good thing in certain situations. But could you > kindly tell me what will you do with this information (assuming it's > possible to obtain it of course)? > > Pavel > > On Sat, Oct 27, 2012 at 11:54 PM, David Barrett <dbarr...@expensify.com> > wrote: > > I completely understand the wide and varied differences. I'm just *also* > > interested in this very specific issue. > > > > -david > > > > On Sun, Oct 28, 2012 at 12:38 PM, Pavel Ivanov <paiva...@gmail.com> > wrote: > > > >> > That said, I'd still welcome any quick summary of the differences > between > >> > sqlite and mysql when it comes to fragmentation. > >> > >> This is far from main differences between sqlite and mysql that you > >> should consider if you want to choose between them unless of course > >> your question is just about an academic interest. As you are talking > >> about employees I guess you are not in some purely academic exercise. > >> In this case think more about in-process code vs pumping through > >> TCP/IP stack, designed mostly to be accessible from machine-local > >> processes only vs accessible to anyone on the network, plain access to > >> everything vs versatile and complicated authorization and > >> authentication mechanisms, and so on and so forth. Database format is > >> never a part of the decision which DBMS you want to use. > >> > >> Pavel > >> > >> > >> On Sat, Oct 27, 2012 at 9:32 PM, David Barrett <dbarr...@expensify.com> > >> wrote: > >> > Thanks, this is really helpful! (And I lecture my employees about the > >> > evils of premature optimization all the time. In fact, I'll lecture > >> anyone > >> > in earshot, so frequently that it's the butt of jokes.) > >> > > >> > That said, I'd still welcome any quick summary of the differences > between > >> > sqlite and mysql when it comes to fragmentation. I often get in > debates > >> > about sqlite versus other datbases, and I'm always eager to be > informed. > >> > Thanks! > >> > > >> > -david > >> > > >> > On Sat, Oct 27, 2012 at 11:42 PM, Simon Slavin <slav...@bigfraud.org> > >> wrote: > >> > > >> >> > >> >> On 27 Oct 2012, at 11:38am, David Barrett <dbarr...@expensify.com> > >> wrote: > >> >> > >> >> > I'm trying to learn more about MySQL versus sqllite when it comes > to > >> >> > vacuuming and fragmentation, especially as it relates to SSD > storage. > >> >> > >> >> > >> >> Rather than answer your questions point-by-point, I'm going to give > you > >> >> the current state of play. Your understanding of how the various > DBMSes > >> >> work is right, but your excellent question "in a world with SSDs and > a > >> >> tremendous amount of RAM, does vacuuming matter nearly as much as on > a > >> >> spinning disk with constrained RAM?" cuts to the heart of all your > other > >> >> points. The following involves a little simplification and > handwaving > >> >> because otherwise it would be two chapters long and you'd have to do > >> >> homework. > >> >> > >> >> Fragmentation made a big difference to apps running on Windows, but > very > >> >> little on any other platform. This is because Windows does something > >> >> called 'read-ahead caching' which assumes that if you read block B, > >> you're > >> >> soon going to want to read block B+1, so at quite a low level it > >> helpfully > >> >> pre-reads it for you. Other operating systems don't make this > >> assumption. > >> >> This is why Windows users talk about defragmentation so much, but > Unix > >> >> users don't care about it. > >> >> > >> >> SSDs negate the whole point of defragmentation. On a rotational disk > >> it's > >> >> faster to read blocks B, B+1, B+2, B+4, B+5 than it is to read five > >> random > >> >> blocks from the disk, because the read heads are already positioned > in > >> the > >> >> right place, and the disk is going to rotate to show those five > blocks > >> in > >> >> order. SSDs are just like RAM: they're Random Access Memory. > Reading > >> any > >> >> five blocks in any order takes roughly the same time. So nobody > cares > >> >> about fragmentation on an SSD. Read whatever blocks you want in > >> whatever > >> >> order you want. > >> >> > >> >> As to clever management of disk block alignment with respect to rows > and > >> >> columns, this is rarely worth attention these days. The amount of > >> >> programming and debugging time it takes to get this right, and the > >> amount > >> >> of extra processing and disk access you need to do, give you less > >> return on > >> >> investment than if you spent the same money on buying a faster hard > >> disk. > >> >> It's "premature optimization" (look up the term) except for two > cases: > >> >> overnight runs and realtime 3D graphics. If your overnight run takes > >> more > >> >> than one night, you have a problem. If you're programming realtime > 3D > >> >> graphics and they're jerky, your users won't enjoy your simulation. > But > >> >> you wouldn't be using a SQL engine for 3D graphics anyway. > >> >> > >> >> The matters you mentioned were all worth attention back in the 1980s > >> when > >> >> storage and bandwidth were expensive. As you pointed out near the > end > >> of > >> >> your post, these things matter less now. > >> >> > >> >> Simon. > >> >> > >> >> _______________________________________________ > >> >> sqlite-users mailing list > >> >> sqlite-users@sqlite.org > >> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > >> >> > >> > _______________________________________________ > >> > sqlite-users mailing list > >> > sqlite-users@sqlite.org > >> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > >> _______________________________________________ > >> sqlite-users mailing list > >> sqlite-users@sqlite.org > >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > >> > > _______________________________________________ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users