I’m having an issue with a library I am writing. This has not happened before in the 1 ½ years I have been developing the library, but for some reason now it’s rearing its ugly head from time to time. The library will be happily running along and then suddenly a SELECT statement will return error code 26 upon step. After a while of this, it will change to error 11 instead. The database has become corrupted and the integrity check reports a different kind of problem with a different page each time. This seems to happen more often on Windows tablets. I have only observed the behavior in real time once (and it was on a VM) but logs sent from a client have shown this behavior quite a few times. Mostly they are using SQLCipher based on SQLite 3.8.10.2, but I observed the behavior with regular SQLite 3.12.2
The library manages logical databases, which contain a few backends (one being SQLite). Each logical database contains two connections to an SQLite database. A read only connection that is used for reads outside the library, and a read / write connection used for writing and internal reads during transactions. The read only connection has full mutex enabled (though I’m not sure if this is necessary) so that I can share it between threads without caring. The RW connection is pumped through a queue on a single thread and all other threads must wait their turn until their job is finished. I confirmed that no writes are taking place outside of this thread. This connection is in “multi thread” mode I think (compiled with thread safety on but opened with NO_MUTEX). The library caches and manages logical databases so whenever a consuming app requests one they will either get a new one if one hasn’t been made or the same one if it has. This ensures that only one logical database is ever interacting with the sqlite connection handles of an sqlite database. The logs from the client and my own observations have backed this up. From what I have read, error 11 is extremely hard to cause through library usage alone. The key offences seem to be: 1) Using two versions of SQLite at once in an application 2) Bad OS file locking 3) A rogue process writing garbage into the file The library is written in C#, and it’s true that two versions of SQLite can be used with it (regular SQLite or SQLCipher), however I have limited this so once the user has chosen the implementation they want to use they are not allowed to change it after that. The logs and my observations have shown that only one version of the storage engine is constructed (each storage engine binds with a native library via P/Invoke). My ultimate question is: How can I figure out where the corruption is happening when it does? I’ve learned quite a bit about the actual file format of SQLite and spent a number of hours pouring through the SQLite files I got from the client with a hex editor looking for signs of bad things getting written but in the end that was a bit difficult for me. The thing is, by the time the symptoms show it is already past when the corruption occurred from what I gather. There are no network file systems involved in this, it’s purely local. Are there any things I should be on the look out for? I’d appreciate any advice. --Jim _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users