On 4th February 2012 12:51pm, Alexey Pechnikov wrote: > It's very important but there are some questions about > http://www.sqlite.org/src/info/72b01a982a > Some times ago DRH wrote that checksum calculation don't slow down > SQLite significantly. > But can be this realized in current SQLite 3.x branch? When checksum > can help to restore > damaged pages/database? Does powersave overwrite feature conflicts > with idea of per pages > checksums?
Obviously I can't speak for DRH and co., but I can speak from my own experience, and maybe it will be beneficial to the debate. I implemented a checksumming feature into sqlite back in 2008/9 and have maintained it ever since and now have it in my branch of 3.7.10. This checksum is an 8-bit crc and is calculated *per row* on any table for which it is enabled. My implementation adds two bytes overhead to each row of checksummed tables in the database. I found that when used on a 400mhz PowerPC embedded platform there was a very small but measurable performance regression when dealing with hundreds of thousands of rows of data compared to a database file not performing the crc calculation. On a desktop PC this performance regression was not measurable (i.e. insignificant). I won't go into the details of exactly how it has been implemented, unless someone is particularly interested, but to say this: Since I observed a slight performance regression on slow hardware I made it possible to turn the feature on and off on a per-table basis. This meant a small addition to the SQL syntax for CREATE TABLE. I wanted the checksum calculation to only occur when the row was created and only when checking for checksum errors in the database so that in general use performance wouldn't be affected. I also wished the checksum to be "invisible" to the user in normal operation, e.g. when performing a SELECT * FROM table, or when using an INSERT or UPDATE statement. In the implementation it became necessary to upgrade the file format to "5" to support all these features, and to remain compatible with old database files. In addition to PRAGMA legacy_file_format, I also have PRAGMA compatible_file_format, so that the user can choose between 0, 4 and 5 as the format for the database file. Only in format "5" can the checksum be used. The user cannot find out the checksum of a row, but instead I implemented an additional SQL function which checks the checksum of a particular row and returns '0' or '1' depending on whether the row had become corrupted. A simple statement like this: SELECT rowid FROM table WHERE !valid_row_checksum(rowid); is enough to find corrupted rows. If the checksum feature were to be added to Sqlite as standard, as proposed by Roger Binns, I would strongly suggest that it should be possible to fine-grain the corruption detection to a single row of data. To an actual column within the row would be nice, but I think too much of an overhead to be truly worth while. My implementation also didn't create any checksums on data other than row data. For me that wasn't an issue, but I could see that in other applications it may be a nice feature. I have a complete patch for my implementation with documentation and test-suite additions. If anyone is interested, even the Sqlite developers(!), then drop me an email and I'll post it along. I have no restriction on me stopping this patch being released to the public domain. Andy _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users