> The problem I think may be people using TerminateThread(); that's how > you hard kill a thread. It seems that can interrupt an I/O > operation -- > ie an operation that writes more than one cluster at a time. Meaning, > synch = full may have nothing to do with it. If you have to say write > more than one cluster (disk block), TerminateThread looks like it can > abort the IO op in the middle of a multi-block op? I'm trying to run > that down but can't yet find anything that verifies this.
Even if it does, which seems quite plausible, the only way I can see this causing corruption is if you are in persistent journal mode and a (weak) checksum gives you a false positive on the last, corrupted, record in the journal file. Maybe if you're using a version from a few years back it could happen without persistent-journal mode too. It's easy to be wrong about this kind of thing though. Dan. > So, here is what I think: > 1) You need to write some data, lets say a 8K write. In this > theoretical example, disk blocks are 4k. > 2) You call WriteFile(8K). > 3) WriteFile causes a system call, and the system schedules the first > bit of the i/o (1st 4k). > 4) terminatethread() is called > 5) I/O operation is cancelled (ie, as if CancelIO had been called?), > meaning block #2 was never scheduled. Database now corrupt. > > A lot of people using windows tend to have this kind of threading > architecture: > 1) Signal the thread to exit on its own > 2) Wait for some grace period. > 3) if grace period expires, and thread is still running -- > TerminateThread() because once the grace period expires, the thread is > considered hung. > > So, large transactions in a thread could cause people to use > TerminateThread() at a critical time, especially if that causes the > thread to go over its grace period. For us, these large transactions > took longer than our grace period to complete, and thus were subject > to > TerminateThread -- and lots of corruption. Once we sped everything up > and moved to synch=full, no transaction was even close to the grace > period, and such, terminatethread() is never called.. and we get no > corruption. > > Just a thought. > > -----Original Message----- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy > Sent: Wednesday, October 14, 2009 12:36 AM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] Corrupted database > > > On Oct 14, 2009, at 12:42 AM, McClellen, Chris wrote: > >> But it does happen and we can reproduce it. Hard killing a thread is >> essentially equivalent to turning off the power. > > We have always assumed that it is different. When you write data to > a file, the write is buffered in volatile memory by the OS for a time. > If a power failure occurs during this time, the write is lost. But if > a thread is killed, the OS should still eventually make sure the data > gets to stable storage. > > If you kill the application, then open the database using the shell > tool, is the database corrupted? > > Dan. > > > >> -----Original Message----- >> From: sqlite-users-boun...@sqlite.org >> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy >> Sent: Tuesday, October 13, 2009 12:35 AM >> To: General Discussion of SQLite Database >> Subject: Re: [sqlite] Corrupted database >> >> >> On Oct 13, 2009, at 1:56 AM, McClellen, Chris wrote: >> >>> What is your synchronous set to? Full? FYI If you are using .NET >>> data >>> providers, it is set to "Normal" by default. >>> >>> If it is not set to full, I have seen corruption when an application >>> crashes, or exits when a thread is in the middle of updating the db >>> (Synchronous = OFF makes corruption even easier in this case). I >>> have >>> seen apps that do not wait for background threads to finish before >>> termination, and without full sync on, either the db or the log gets >>> corrupted. A corrupted log can cause problems for you db on next >>> run >>> when recovery happens. >> >> In theory, this shouldn't happen. Unless the application is actually >> buffering data that SQLite thinks has been written to the database or >> journal file in the process space on some systems. >> >> The "synchronous" setting should only make a difference in the event >> of a power or OS failure. That's the theory, anyway. >> >> Dan. >> >> >> >>> >>> >>> >>> -----Original Message----- >>> From: sqlite-users-boun...@sqlite.org >>> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Filip Navara >>> Sent: Monday, October 12, 2009 12:38 PM >>> To: General Discussion of SQLite Database >>> Subject: [sqlite] Corrupted database >>> >>> Hello, >>> >>> for a few months we have been occasionally getting corrupted >>> databases >>> in the field. So far we were unable to acquire any of them from our >>> customers, but this week I finally got hold of one. Output from >>> "pragma integrity_check" is included below. >>> >>> The schema is the following: >>> >>> CREATE VIRTUAL TABLE "LocalMailsIndex3" USING fts3 (id INTEGER, >>> partName TEXT, content TEXT); >>> CREATE TABLE LocalMailsIndex3_content( docid INTEGER PRIMARY >>> KEY,c0id, c1partName, c2content); >>> CREATE TABLE LocalMailsIndex3_segdir( level integer, idx integer, >>> start_block integer, leaves_end_block integer, end_block integer, >>> root blob, primary key(level, idx)); >>> CREATE TABLE LocalMailsIndex3_segments( blockid INTEGER PRIMARY >>> KEY, >>> block blob); >>> CREATE INDEX "LocalMailsIndex3_contentIndex" ON >>> "LocalMailsIndex3_content" ("c0id", "c1partName"); >>> >>> The database is created using SQLite 3.6.14.2, thread safe, on >>> Windows >>> with auto_vacuum=incremental. It is always opened as attached >>> database >>> with journal_mode=persist. Application crashes were most probably >>> involved, but no operating system / power crashes as far as I know. >>> >>> One thread in the application is periodically running "pragma >>> freelist_count" and "pragma incremental_vacuum(...)". Other threads >>> are running combination of the following commands and no other: >>> >>> INSERT INTO mail_fti.LocalMailsIndex3("id", "partName", "content") >>> VALUES (@id, @partName, @content) >>> SELECT c2content AS content FROM mail_fti.LocalMailsIndex3_content >>> WHERE c0...@id AND c1partna...@partname >>> SELECT docid FROM mail_fti.LocalMailsIndex3_content WHERE c0id IN >>> (...) >>> DELETE FROM mail_fti.LocalMailsIndex3 WHERE doc...@docid >>> SELECT ... WHERE "id" IN (SELECT "id" FROM mail_fti.LocalMailsIndex3 >>> WHERE content MATCH "...") >>> >>> Anybody has seen something like this? >>> Anybody willing to look at it? I can send the database privately. >>> >>> Best regards, >>> Filip Navara >>> >>> Main freelist: Bad ptr map entry key=5143 expected=(2,0) >>> got=(3,4467) >>> Main freelist: freelist leaf count too big on page 5143 >>> Main freelist: Bad ptr map entry key=5449 expected=(2,0) >>> got=(4,5143) >>> Main freelist: freelist leaf count too big on page 5449 >>> Main freelist: 904 of 908 pages missing from overflow list starting >>> at >>> 5143 >>> On tree page 3878 cell 26: invalid page number 5737 >>> On tree page 3878 cell 26: Child page depth differs >>> On tree page 3878 cell 27: Failed to read ptrmap key=5746 >>> On tree page 3878 cell 27: invalid page number 5746 >>> On tree page 3878 cell 28: Failed to read ptrmap key=5748 >>> On tree page 3878 cell 28: invalid page number 5748 >>> On tree page 3878 cell 29: Failed to read ptrmap key=5749 >>> On tree page 3878 cell 29: invalid page number 5749 >>> On tree page 3878 cell 30: Failed to read ptrmap key=5755 >>> On tree page 3878 cell 30: invalid page number 5755 >>> On tree page 3878 cell 31: Failed to read ptrmap key=5757 >>> On tree page 3878 cell 31: invalid page number 5757 >>> On tree page 3878 cell 32: Failed to read ptrmap key=5759 >>> On tree page 3878 cell 32: invalid page number 5759 >>> On tree page 3878 cell 33: Failed to read ptrmap key=5761 >>> On tree page 3878 cell 33: invalid page number 5761 >>> On tree page 3878 cell 34: Failed to read ptrmap key=5763 >>> On tree page 3878 cell 34: invalid page number 5763 >>> On tree page 3878 cell 35: Failed to read ptrmap key=5767 >>> On tree page 3878 cell 35: invalid page number 5767 >>> On tree page 3878 cell 36: Failed to read ptrmap key=5769 >>> On tree page 3878 cell 36: invalid page number 5769 >>> On tree page 3878 cell 37: Failed to read ptrmap key=5771 >>> On tree page 3878 cell 37: invalid page number 5771 >>> On tree page 3878 cell 38: Failed to read ptrmap key=5773 >>> On tree page 3878 cell 38: invalid page number 5773 >>> On tree page 3878 cell 39: Failed to read ptrmap key=5775 >>> On tree page 3878 cell 39: invalid page number 5775 >>> On tree page 3878 cell 40: Failed to read ptrmap key=5777 >>> On tree page 3878 cell 40: invalid page number 5777 >>> On tree page 3878 cell 41: Failed to read ptrmap key=5780 >>> On tree page 3878 cell 41: invalid page number 5780 >>> On tree page 3878 cell 42: Failed to read ptrmap key=5783 >>> On tree page 3878 cell 42: invalid page number 5783 >>> On tree page 3878 cell 43: Failed to read ptrmap key=5787 >>> On tree page 3878 cell 43: invalid page number 5787 >>> On tree page 3878 cell 44: Failed to read ptrmap key=5789 >>> On tree page 3878 cell 44: invalid page number 5789 >>> On tree page 3878 cell 45: Failed to read ptrmap key=5793 >>> On tree page 3878 cell 45: invalid page number 5793 >>> On tree page 3878 cell 46: Failed to read ptrmap key=5795 >>> On tree page 3878 cell 46: invalid page number 5795 >>> On tree page 3878 cell 47: Failed to read ptrmap key=5797 >>> On tree page 3878 cell 47: invalid page number 5797 >>> On tree page 3878 cell 48: Failed to read ptrmap key=5801 >>> On tree page 3878 cell 48: invalid page number 5801 >>> On tree page 3878 cell 49: Failed to read ptrmap key=5805 >>> On tree page 3878 cell 49: invalid page number 5805 >>> On tree page 3878 cell 50: Failed to read ptrmap key=5807 >>> On tree page 3878 cell 50: invalid page number 5807 >>> On tree page 3878 cell 51: Failed to read ptrmap key=5810 >>> On tree page 3878 cell 51: invalid page number 5810 >>> On tree page 3878 cell 52: Failed to read ptrmap key=5813 >>> On tree page 3878 cell 52: invalid page number 5813 >>> On tree page 3878 cell 53: Failed to read ptrmap key=5817 >>> On tree page 3878 cell 53: invalid page number 5817 >>> On tree page 3878 cell 54: Failed to read ptrmap key=5822 >>> On tree page 3878 cell 54: invalid page number 5822 >>> On tree page 3878 cell 55: Failed to read ptrmap key=5826 >>> On tree page 3878 cell 55: invalid page number 5826 >>> On tree page 3878 cell 56: Failed to read ptrmap key=5831 >>> On tree page 3878 cell 56: invalid page number 5831 >>> On tree page 3878 cell 57: Failed to read ptrmap key=5836 >>> On tree page 3878 cell 57: invalid page number 5836 >>> On tree page 3878 cell 58: Failed to read ptrmap key=5838 >>> On tree page 3878 cell 58: invalid page number 5838 >>> On tree page 3878 cell 59: Failed to read ptrmap key=5841 >>> On tree page 3878 cell 59: invalid page number 5841 >>> On tree page 3878 cell 60: Failed to read ptrmap key=5843 >>> On tree page 3878 cell 60: invalid page number 5843 >>> On tree page 3878 cell 61: Failed to read ptrmap key=5846 >>> On tree page 3878 cell 61: invalid page number 5846 >>> On tree page 3878 cell 62: Failed to read ptrmap key=5848 >>> On tree page 3878 cell 62: invalid page number 5848 >>> On tree page 3878 cell 63: Failed to read ptrmap key=5850 >>> On tree page 3878 cell 63: invalid page number 5850 >>> On tree page 3878 cell 64: Failed to read ptrmap key=5853 >>> On tree page 3878 cell 64: invalid page number 5853 >>> On tree page 3878 cell 65: Failed to read ptrmap key=5857 >>> On tree page 3878 cell 65: invalid page number 5857 >>> On tree page 3878 cell 66: Failed to read ptrmap key=5859 >>> On tree page 3878 cell 66: invalid page number 5859 >>> On tree page 3878 cell 67: Failed to read ptrmap key=5862 >>> On tree page 3878 cell 67: invalid page number 5862 >>> On tree page 3878 cell 68: Failed to read ptrmap key=5868 >>> On tree page 3878 cell 68: invalid page number 5868 >>> On tree page 3878 cell 69: Failed to read ptrmap key=5870 >>> On tree page 3878 cell 69: invalid page number 5870 >>> On tree page 3878 cell 70: Failed to read ptrmap key=5874 >>> On tree page 3878 cell 70: invalid page number 5874 >>> On tree page 3878 cell 71: Failed to read ptrmap key=5876 >>> On tree page 3878 cell 71: invalid page number 5876 >>> On tree page 3878 cell 72: Bad ptr map entry key=4495 >>> expected=(5,3878) >>> got=(2,0 >>> ) >>> On tree page 4495 cell 0: Bad ptr map entry key=4492 >>> expected=(3,4495) >>> got=(2,0) >>> >>> On tree page 3878 cell 72: Child page depth differs >>> _______________________________________________ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> _______________________________________________ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users