A checksum per-row is certainly a valuable thing at the layer above
the RDBMS since it allows for integrity checking above the RBDMS, and
in an RBDMS-independent manner.  Of course, this approach is easiest
to graft into SQLite3 simply because you'd be adding a hidden column
and the machinery for maintaining and checking it fits into a handful
of VM opcodes, thus barely changing the on-disk format, but...

... for an integrity protection scheme within the RDBMS I would argue
that it's better to have per-page checksums.  The argument is based on
the ZFS argument for checksumming entire blocks (and storing the
checksum with the block pointers): that you then pay for the cost of
checksumming only once per-I/O (whereas with a per-row solution you
either pay the checksum penalty every time you read that row, even
from cache, or you must do some extra bookkeeping to ensure that you
checksum no more than once per-{I/O, row}).  I suppose this might not
work out as well for an RDBMS as for ZFS since ZFS is able to get
offsetting benefits, such as reduced I/O times by compressing blocks.

Which brings me to a different point, which is that if you can host a
SQLite3 DB on a filesystem like ZFS then you don't have to concern
yourself quite as much with doing data integrity protection within
SQLite3.  What's the best layer for integrity protection in your
application, then?  So far we have three possible layers: app-layer,
SQLite3, filesystem.  If you trust your storage to provide integrity
protection you might add that as a layer.  I'm not sure that there's a
one-size-fits-all answer here, and I find your own solution to be
rather clever (since easy to graft in and keep up to date).

One common issue in all cases is: how to report integrity check
failures, and what else to do about them.  This is another area where
ZFS gets benefits you're not likely to get from higher-layers: since
ZFS integrates volume management into the filesystem it can use
integrity protection checksums to help correct errors when there is
redundancy (e.g., mirroring) in the volume.

Nico
--
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to