> On Aug 4, 2017, at 11:28 AM, Nico Williams <n...@cryptonector.com> wrote: > > Imagine a mode where there is only a WAL, and to checkpoint is to write > a new WAL with only live contents and... rename(2) into place.
What you’re describing is exactly how CouchDB’s storage engine works, as well as descendants like Couchbase Server’s CouchStore and ForestDB. (Note: I work for Couchbase.) Efficient lookups in a file like this require the existence of a bunch of extraneous metadata like interior B-tree nodes. This metadata changes all the time as records are written*, so a lot of it has to be written out too along with every transaction, resulting in substantial write amplification. The other big drawback is that compaction (the checkpoint step you describe) is very expensive in terms of I/O. I’ve known of CouchDB systems that took many hours to compact their databases, and since every write that occurs during a compaction has to be replayed onto the new file after the copy before compaction completes, one can get into a state where a busy database either never actually finishes compacting, or has to temporarily block all writers just so it can get the damn job done without interruption. (It’s a similar problem to GC thrash.) We’ve also seen that, on low-end hardware like mobile devices, I/O bandwidth is limited enough that a running compaction can really harm the responsiveness of the _entire OS_, as well as cause significant battery drain. —Jens * Modifying/rewriting a single record requires rewriting the leaf node that points to it, which requires rewriting the parent node that points to the leaf, and this ripples all the way up to the root node. _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users