When SQLite writes to the log file, it (1) writes all the data, (2) fsyncs, 
then (3) updates the page count in the header, and finally (4) fsyncs again. 
Isn't it possible to change SQLite so that the steps 3 and 4 are unnecessary?

In particular, if SQLite had an end of file marker for the transaction log, and 
if the page checksums were more reliable, then the page count in the header 
would be unnecessary. In particular, once the reliability offered by the 
checksum is as high as the reliability offered by the write ordering, then the 
choice between the two depends only on performance. And, as the gap between CPU 
performance and disk sync performance widens, it stands to reason that 
checksumming will be increasingly faster for more and more transactions. Is 
there some flaw with this idea?

I did read this in the SQLite source code:

** This is not a real checksum.  It is really just the sum of the 
** random initial value and the page number.  We experimented with
** a checksum of the entire data, but that was found to be too slow.

>From this, we can see that the "checksum" currently used by SQLite is not 
>reliable at detecting errors on its own; a stronger checksum would be needed. 
>I read RFC 3385 [1], which describes the checksums used by the iSCSI protocol. 
>They provide evidence that their choice, CRC-32C, is very reliable at 
>detecting errors, while being fast and simple to implement. I haven't done my 
>own measurements yet, but I find it hard to believe that for typical 
>transactions, computing CRC-32s is going to be slower than the extra 
>seek+write+fsync that is required. When you did your testing, how many pages 
>were required to be touched before checksumming approached the slowness of 
>seek+write+fsync?

Besides my interest in this as a SQLite user, I also am interested in this I am 
building a simple persistent log file system, where the entries are very small 
(100 bytes each) and I would like to avoid any requirement to align data on 
sector boundaries, I would like to avoid doing multiple fsyncs if possible, and 
I would like to write to the file in an append-only fashion. Also, the CouchDB 
developers recently raised a similar issue [2]. FWIW, I am more interested in 
the safety of this approach (vs. write ordering) than the performance, as my 
application almost never writes more than one sector per transaction.

Thanks,
Brian

[1] http://tools.ietf.org/html/rfc3385
[2] http://damienkatz.net/2008/02/faster_couchdb.html

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to