[EMAIL PROTECTED] wrote:
> Check-in [3548] fixes a problem in the pager which can lead to
> database corruption on a heavily loaded system running autovacuum.
> I am continuing to analyze the problem in order to fully
> characterize the circumstances under which database corruption
> might occur.  Once this analysis is complete, you can expect
> to see the release version 3.3.9 containing the fix.
> 

I am still attempting to characterize the circumstances under
which database corruption can occur.  I need additional data
from Ron Aviel in order to continue with this analysis and
he will likely be unavailable until tomorrow.  So 3.3.9 will
probably not be out until later this week.

So far, the only path I have found that can lead to corruption
is if two processes both try to rollback a hot journal at the
same time.  These two processes will race to get a lock on the
database.  Only one will succeed.  The second process will back
off.  But that second process might have left its cache in
an inconsistent state which could later result in database
corruption.  A hot journal can only result if a process that
is in the middle of a write transaction dies or otherwise
terminates without shutting down SQLite cleanly.

Recap:  The only path to corrupting a database so far discovered
in the bug fixed by [3548] is as follows:

  (1) One process starts a write transaction, makes changes to 
      the database which are incomplete, then aborts or exits 
      without closing the database and completing the transaction.

  (2) Two other processes attempt to access the database at almost
      the same moment in time.  Both see that the database was only 
      partially updated in the previous step and both attempt to 
      playback the journal in order to rollback the transaction.  
      Only one will be successful at this.  The other will back off.

  (3) The second of the two processes above, the one that did
      not playback the journal, goes on to make other changes
      to the database file based on an incorrect cache image -
      resulting in database corruption.

This is a very unlikely sequence of events.  Step (1) should
not often happen on an otherwise well-behaved system.  You will
be very hard-pressed to make (2) happen unless you have multiple
processors and even then the race condition appears to be very
tight.  

There may be other paths which can exercise the problem, but this
is the only one that I have found so far.  Because this is so
obscure, I think I am justified in waiting another day or two 
before push out version 3.3.9 in order to better understand what
is going on.

--
D. Richard Hipp  <[EMAIL PROTECTED]>



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to