On Fri, Oct 22, 2010 at 09:56:22PM -0400, Richard Hipp wrote:
> On many (most?) filesystems, it is faster to overwrite an existing area of a
> file than it is to extend the file by writing past the end.  That's why
> SQLite doesn't truncate the WAL file on each checkpoint - so that subsequent
> writes will be overwriting an existing file region and thus go faster.

I think that assumption is getting harder to make.  For one thing,
filesystems nowadays aggregate transactions into large writes, which
means that old blocks aren't overwritten, but replaced -- COW goes
hand-in-hand with such aggregation.

For ZFS your assumption is wrong because of ZFS' variable block size
support[*].  I don't know if there are filesystems other than ZFS where
file data block size varies.  But for filesystems that aggregate writes
I think you'd find that overwriting performs about as well as appending
(assuming there's no O_APPEND synchronization going on).

Does the WAL store modified pages?

Nico

[*] In ZFS files have a single data block until the file size exceeds
    the host filesystem's "recordsize", from which point the file will
    consist of two or more data blocks, all of that size.  Block sizes
    are all the powers of two between nine and seventeen (512 bytes to
    128KB).  Thus overwriting a 1KB SQLite3 page in the middle of the
    file with 128KB recordsize will result in a read-modify-write of the
    modified block.  Though likely the application will have already
    caused that block to be in memory, in which case there's no RMW, but
    the disparity between application "page size" and ZFS recordsize
    obviously has a significant cost.  SQLite3 users should set the
    SQLite3 page size and host ZFS dataset recordsize so they match.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to