I’m having an issue with a library I am writing.  This has not happened before 
in the 1 ½ years I have been developing the library, but for some reason now 
it’s rearing its ugly head from time to time.  The library will be happily 
running along and then suddenly a SELECT statement will return error code 26 
upon step.  After a while of this, it will change to error 11 instead.  The 
database has become corrupted and the integrity check reports a different kind 
of problem with a different page each time.  This seems to happen more often on 
Windows tablets.  I have only observed the behavior in real time once (and it 
was on a VM) but logs sent from a client have shown this behavior quite a few 
times.  Mostly they are using SQLCipher based on SQLite 3.8.10.2, but I 
observed the behavior with regular SQLite 3.12.2

The library manages logical databases, which contain a few backends (one being 
SQLite).  Each logical database contains two connections to an SQLite database. 
 A read only connection that is used for reads outside the library, and a read 
/ write connection used for writing and internal reads during transactions.  
The read only connection has full mutex enabled (though I’m not sure if this is 
necessary) so that I can share it between threads without caring.  The RW 
connection is pumped through a queue on a single thread and all other threads 
must wait their turn until their job is finished.  I confirmed that no writes 
are taking place outside of this thread.  This connection is in “multi thread” 
mode I think (compiled with thread safety on but opened with NO_MUTEX).

The library caches and manages logical databases so whenever a consuming app 
requests one they will either get a new one if one hasn’t been made or the same 
one if it has.  This ensures that only one logical database is ever interacting 
with the sqlite connection handles of an sqlite database.  The logs from the 
client and my own observations have backed this up.  

From what I have read, error 11 is extremely hard to cause through library 
usage alone.  The key offences seem to be:

1) Using two versions of SQLite at once in an application
2) Bad OS file locking
3) A rogue process writing garbage into the file

The library is written in C#, and it’s true that two versions of SQLite can be 
used with it (regular SQLite or SQLCipher), however I have limited this so once 
the user has chosen the implementation they want to use they are not allowed to 
change it after that.  The logs and my observations have shown that only one 
version of the storage engine is constructed (each storage engine binds with a 
native library via P/Invoke).  

My ultimate question is:  How can I figure out where the corruption is 
happening when it does?  I’ve learned quite a bit about the actual file format 
of SQLite and spent a number of hours pouring through the SQLite files I got 
from the client with a hex editor looking for signs of bad things getting 
written but in the end that was a bit difficult for me.  The thing is, by the 
time the symptoms show it is already past when the corruption occurred from 
what I gather.  There are no network file systems involved in this, it’s purely 
local.  Are there any things I should be on the look out for?  I’d appreciate 
any advice.  


--Jim

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to