On 4th February 2012 12:51pm, Alexey Pechnikov wrote:

> It's very important but there are some questions about
> http://www.sqlite.org/src/info/72b01a982a
> Some times ago DRH wrote that checksum calculation don't slow down
> SQLite significantly.
> But can be this realized in current SQLite 3.x branch? When checksum
> can help to restore
> damaged pages/database? Does powersave overwrite feature conflicts
> with idea of per pages
> checksums?

Obviously I can't speak for DRH and co., but I can speak from my own 
experience, and maybe it will be beneficial to the debate.

I implemented a checksumming feature into sqlite back in 2008/9 and have 
maintained it ever since and now have it in my branch of 3.7.10.  This 
checksum is an 8-bit crc and is calculated *per row* on any table for which 
it is enabled.  My implementation adds two bytes overhead to each row of 
checksummed tables in the database.  I found that when used on a 400mhz 
PowerPC embedded platform there was a very small but measurable performance 
regression when dealing with hundreds of thousands of rows of data compared 
to a database file not performing the crc calculation.  On a desktop PC this 
performance regression was not measurable (i.e. insignificant).

I won't go into the details of exactly how it has been implemented, unless 
someone is particularly interested, but to say this:

Since I observed a slight performance regression on slow hardware I made it 
possible to turn the feature on and off on a per-table basis.  This meant a 
small addition to the SQL syntax for CREATE TABLE.  I wanted the checksum 
calculation to only occur when the row was created and only when checking 
for checksum errors in the database so that in general use performance 
wouldn't be affected.  I also wished the checksum to be "invisible" to the 
user in normal operation, e.g. when performing a SELECT * FROM table, or 
when using an INSERT or UPDATE statement.

In the implementation it became necessary to upgrade the file format to "5" 
to support all these features, and to remain compatible with old database 
files.  In addition to PRAGMA legacy_file_format, I also have PRAGMA 
compatible_file_format, so that the user can choose between 0, 4 and 5 as 
the format for the database file.  Only in format "5" can the checksum be 
used.

The user cannot find out the checksum of a row, but instead I implemented an 
additional SQL function which checks the checksum of a particular row and 
returns '0' or '1' depending on whether the row had become corrupted.  A 
simple statement like this:

SELECT rowid FROM table WHERE !valid_row_checksum(rowid);

is enough to find corrupted rows.

If the checksum feature were to be added to Sqlite as standard, as proposed 
by Roger Binns, I would strongly suggest that it should be possible to 
fine-grain the corruption detection to a single row of data.  To an actual 
column within the row would be nice, but I think too much of an overhead to 
be truly worth while.  My implementation also didn't create any checksums on 
data other than row data.  For me that wasn't an issue, but I could see that 
in other applications it may be a nice feature.

I have a complete patch for my implementation with documentation and 
test-suite additions.  If anyone is interested, even the Sqlite 
developers(!), then drop me an email and I'll post it along.  I have no 
restriction on me stopping this patch being released to the public domain.

Andy




_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to