"Brian Smith" <[EMAIL PROTECTED]> wrote: > When SQLite writes to the log file, it > > (1) writes all the data, > (2) fsyncs, then > (3) updates the page count in the header, and finally > (4) fsyncs again. > > Isn't it possible to change SQLite so that the steps 3 > and 4 are unnecessary? >
That depends on your filesystem. On many modern file systems you can safely omit 3 and 4. And if the xDeviceCharacteristics() method of the VFS implementation for a particular filesystem reports SQLITE_IOCAP_SAFE_APPEND, then SQLite skips steps 3 and 4. Steps 3 and 4 are also skipped if you set PRAGMA synchronous=NORMAL; instead of the default PRAGMA synchronous=FULL; It has been reported to us that by omitting steps 3 and 4 you get about a 30% speed improvement on MacOS X. But without steps 3 and 4 and on some filesystems, a power failure that occurs while the journal is being written can result in database corruption. The damage happens like this: The journal is written to the disk out-of-order so that later parts of the journal are written before some earlier parts. And the power failure occurs before the entire journal is written. So you are left with something like this: ........****.... Where "." indicates valid data, and "*" indicates random trash that just happen to be on the disk - not information that was written by SQLite. Once power is restore, the next SQLite process to open the file would see the "hot" journal, notice that both the header and the "end mark" are valid, but the stuff in the middle is not. That invalid stuff in the middle will get played back into the database, corrupting it. It is true that a sufficiently strong checksum might detect the corruption in the middle. SQLite does do some checksumming to try to detect this sort of problem when you specify PRAGMA synchronous=NORMAL. But checksums are not 100%. The SAFE_APPEND attribute means that the filesystem does not extend the size of the file until the content has been safely written to oxide. For filesystems that support SAFE_APPEND, there is no possibility of getting corrupt data in the file during a power failure. I suspect that most modern journalling filesystems are SAFE_APPEND, but I'll continue to assume the worst until I know for sure. Early versions of SQLite2 always assumed SAFE_APPEND. Then we got some reports from the field of corruption following power loss that was ultimately traced back to the scenario described above. That's why we added the extra fsyncs and checksums. -- D. Richard Hipp <[EMAIL PROTECTED]> _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users