[EMAIL PROTECTED] wrote: > Check-in [3548] fixes a problem in the pager which can lead to > database corruption on a heavily loaded system running autovacuum. > I am continuing to analyze the problem in order to fully > characterize the circumstances under which database corruption > might occur. Once this analysis is complete, you can expect > to see the release version 3.3.9 containing the fix. >
I am still attempting to characterize the circumstances under which database corruption can occur. I need additional data from Ron Aviel in order to continue with this analysis and he will likely be unavailable until tomorrow. So 3.3.9 will probably not be out until later this week. So far, the only path I have found that can lead to corruption is if two processes both try to rollback a hot journal at the same time. These two processes will race to get a lock on the database. Only one will succeed. The second process will back off. But that second process might have left its cache in an inconsistent state which could later result in database corruption. A hot journal can only result if a process that is in the middle of a write transaction dies or otherwise terminates without shutting down SQLite cleanly. Recap: The only path to corrupting a database so far discovered in the bug fixed by [3548] is as follows: (1) One process starts a write transaction, makes changes to the database which are incomplete, then aborts or exits without closing the database and completing the transaction. (2) Two other processes attempt to access the database at almost the same moment in time. Both see that the database was only partially updated in the previous step and both attempt to playback the journal in order to rollback the transaction. Only one will be successful at this. The other will back off. (3) The second of the two processes above, the one that did not playback the journal, goes on to make other changes to the database file based on an incorrect cache image - resulting in database corruption. This is a very unlikely sequence of events. Step (1) should not often happen on an otherwise well-behaved system. You will be very hard-pressed to make (2) happen unless you have multiple processors and even then the race condition appears to be very tight. There may be other paths which can exercise the problem, but this is the only one that I have found so far. Because this is so obscure, I think I am justified in waiting another day or two before push out version 3.3.9 in order to better understand what is going on. -- D. Richard Hipp <[EMAIL PROTECTED]> ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------