Yes, I agree. What I am now trying to find out is if things like running a service or .NET service causes terminatethread to be called behind the scenes as some kind of cleanup. The testing was to show that this can be a problem, to characterize why some dbs can get corrupted on "normal exits"
-----Original Message----- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Teg Sent: Wednesday, October 14, 2009 12:10 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] Corrupted database Hello Chris, It's always a mistake to use TerminateThread. Even Microsoft warns against it. >From MSDN: - If the target thread owns a critical section, the critical section will not be released. - If the target thread is allocating memory from the heap, the heap lock will not be released. - If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's process could be inconsistent. - If the target thread is manipulating the global state of a shared DLL, the state of the DLL could be destroyed, affecting other users of the DLL. Terminating a thread can hang the whole application of I read #1 and #2 correctly. If you use TerminateThread regularly, you should really re-consider your design. C Wednesday, October 14, 2009, 11:29:29 AM, you wrote: MC> Yes, if we are in the middle of a lot of updates/inserts and just MC> terminate the thread, pragma integrity_check from the sqlite3 command MC> line tool will report corruption at times. Normally, when we hard kill MC> a thread in the middle of these ops, a journal is left behind. I think MC> we only see corruption in this case (journal left behind), but cannot be MC> sure. Our transactions can be large (a few thousand records totaling a MC> few megabytes). MC> Summary of steps: MC> 1) hard-Kill a thread in the middle of inserting/updating a large MC> transaction (2mb+ transaction). MC> 2) with nothing else running, do a pragma integrity_check in sqlite3 MC> command line client against the db. Obviously after executing sqlite3 MC> client, the journal disappears since I guess recovery ran. MC> 3) integrity_check spits out lots of errors. MC> Note that we have never had this happen with Synchronous=Full, only MC> Synchronous=Off. Have not tried normal. That's all I was trying to MC> tell the original person asking. I don't know why this would happen; it MC> seems logical that once you execute a write to the FS, whether or not MC> the app crashes/spontaneously exits that the write would make it. MC> However, all of this may be a red herring.. It turns out we moved to MC> Synch=Full after we sped up our db operations greatly... see below as to MC> why that may be the factor (speed), and not what Synch is set to. MC> By the way, this is all under windows. MC> I'm assuming that sqlite3 uses sync i/o via writefile. Here's a quick MC> read on sync IO for windows: MC> http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx -- MC> specifically what worries me is that TerminateThread() looks like it can MC> interrupt an I/O operation. MC> The problem I think may be people using TerminateThread(); that's how MC> you hard kill a thread. It seems that can interrupt an I/O operation -- MC> ie an operation that writes more than one cluster at a time. Meaning, MC> synch = full may have nothing to do with it. If you have to say write MC> more than one cluster (disk block), TerminateThread looks like it can MC> abort the IO op in the middle of a multi-block op? I'm trying to run MC> that down but can't yet find anything that verifies this. MC> So, here is what I think: MC> 1) You need to write some data, lets say a 8K write. In this MC> theoretical example, disk blocks are 4k. MC> 2) You call WriteFile(8K). MC> 3) WriteFile causes a system call, and the system schedules the first MC> bit of the i/o (1st 4k). MC> 4) terminatethread() is called MC> 5) I/O operation is cancelled (ie, as if CancelIO had been called?), MC> meaning block #2 was never scheduled. Database now corrupt. MC> A lot of people using windows tend to have this kind of threading MC> architecture: MC> 1) Signal the thread to exit on its own MC> 2) Wait for some grace period. MC> 3) if grace period expires, and thread is still running -- MC> TerminateThread() because once the grace period expires, the thread is MC> considered hung. MC> So, large transactions in a thread could cause people to use MC> TerminateThread() at a critical time, especially if that causes the MC> thread to go over its grace period. For us, these large transactions MC> took longer than our grace period to complete, and thus were subject to MC> TerminateThread -- and lots of corruption. Once we sped everything up MC> and moved to synch=full, no transaction was even close to the grace MC> period, and such, terminatethread() is never called.. and we get no MC> corruption. MC> Just a thought. MC> -----Original Message----- MC> From: sqlite-users-boun...@sqlite.org MC> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy MC> Sent: Wednesday, October 14, 2009 12:36 AM MC> To: General Discussion of SQLite Database MC> Subject: Re: [sqlite] Corrupted database MC> On Oct 14, 2009, at 12:42 AM, McClellen, Chris wrote: >> But it does happen and we can reproduce it. Hard killing a thread is >> essentially equivalent to turning off the power. MC> We have always assumed that it is different. When you write data to MC> a file, the write is buffered in volatile memory by the OS for a time. MC> If a power failure occurs during this time, the write is lost. But if MC> a thread is killed, the OS should still eventually make sure the data MC> gets to stable storage. MC> If you kill the application, then open the database using the shell MC> tool, is the database corrupted? MC> Dan. >> -----Original Message----- >> From: sqlite-users-boun...@sqlite.org >> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy >> Sent: Tuesday, October 13, 2009 12:35 AM >> To: General Discussion of SQLite Database >> Subject: Re: [sqlite] Corrupted database >> >> >> On Oct 13, 2009, at 1:56 AM, McClellen, Chris wrote: >> >>> What is your synchronous set to? Full? FYI If you are using .NET >>> data >>> providers, it is set to "Normal" by default. >>> >>> If it is not set to full, I have seen corruption when an application >>> crashes, or exits when a thread is in the middle of updating the db >>> (Synchronous = OFF makes corruption even easier in this case). I >>> have >>> seen apps that do not wait for background threads to finish before >>> termination, and without full sync on, either the db or the log gets >>> corrupted. A corrupted log can cause problems for you db on next run >>> when recovery happens. >> >> In theory, this shouldn't happen. Unless the application is actually >> buffering data that SQLite thinks has been written to the database or >> journal file in the process space on some systems. >> >> The "synchronous" setting should only make a difference in the event >> of a power or OS failure. That's the theory, anyway. >> >> Dan. >> >> >> >>> >>> >>> >>> -----Original Message----- >>> From: sqlite-users-boun...@sqlite.org >>> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Filip Navara >>> Sent: Monday, October 12, 2009 12:38 PM >>> To: General Discussion of SQLite Database >>> Subject: [sqlite] Corrupted database >>> >>> Hello, >>> >>> for a few months we have been occasionally getting corrupted >>> databases >>> in the field. So far we were unable to acquire any of them from our >>> customers, but this week I finally got hold of one. Output from >>> "pragma integrity_check" is included below. >>> >>> The schema is the following: >>> >>> CREATE VIRTUAL TABLE "LocalMailsIndex3" USING fts3 (id INTEGER, >>> partName TEXT, content TEXT); >>> CREATE TABLE LocalMailsIndex3_content( docid INTEGER PRIMARY >>> KEY,c0id, c1partName, c2content); >>> CREATE TABLE LocalMailsIndex3_segdir( level integer, idx integer, >>> start_block integer, leaves_end_block integer, end_block integer, >>> root blob, primary key(level, idx)); >>> CREATE TABLE LocalMailsIndex3_segments( blockid INTEGER PRIMARY KEY, >>> block blob); >>> CREATE INDEX "LocalMailsIndex3_contentIndex" ON >>> "LocalMailsIndex3_content" ("c0id", "c1partName"); >>> >>> The database is created using SQLite 3.6.14.2, thread safe, on >>> Windows >>> with auto_vacuum=incremental. It is always opened as attached >>> database >>> with journal_mode=persist. Application crashes were most probably >>> involved, but no operating system / power crashes as far as I know. >>> >>> One thread in the application is periodically running "pragma >>> freelist_count" and "pragma incremental_vacuum(...)". Other threads >>> are running combination of the following commands and no other: >>> >>> INSERT INTO mail_fti.LocalMailsIndex3("id", "partName", "content") >>> VALUES (@id, @partName, @content) >>> SELECT c2content AS content FROM mail_fti.LocalMailsIndex3_content >>> WHERE c0...@id AND c1partna...@partname >>> SELECT docid FROM mail_fti.LocalMailsIndex3_content WHERE c0id IN >>> (...) >>> DELETE FROM mail_fti.LocalMailsIndex3 WHERE doc...@docid >>> SELECT ... WHERE "id" IN (SELECT "id" FROM mail_fti.LocalMailsIndex3 >>> WHERE content MATCH "...") >>> >>> Anybody has seen something like this? >>> Anybody willing to look at it? I can send the database privately. >>> >>> Best regards, >>> Filip Navara >>> >>> Main freelist: Bad ptr map entry key=5143 expected=(2,0) got=(3,4467) >>> Main freelist: freelist leaf count too big on page 5143 >>> Main freelist: Bad ptr map entry key=5449 expected=(2,0) got=(4,5143) >>> Main freelist: freelist leaf count too big on page 5449 >>> Main freelist: 904 of 908 pages missing from overflow list starting >>> at >>> 5143 >>> On tree page 3878 cell 26: invalid page number 5737 >>> On tree page 3878 cell 26: Child page depth differs >>> On tree page 3878 cell 27: Failed to read ptrmap key=5746 >>> On tree page 3878 cell 27: invalid page number 5746 >>> On tree page 3878 cell 28: Failed to read ptrmap key=5748 >>> On tree page 3878 cell 28: invalid page number 5748 >>> On tree page 3878 cell 29: Failed to read ptrmap key=5749 >>> On tree page 3878 cell 29: invalid page number 5749 >>> On tree page 3878 cell 30: Failed to read ptrmap key=5755 >>> On tree page 3878 cell 30: invalid page number 5755 >>> On tree page 3878 cell 31: Failed to read ptrmap key=5757 >>> On tree page 3878 cell 31: invalid page number 5757 >>> On tree page 3878 cell 32: Failed to read ptrmap key=5759 >>> On tree page 3878 cell 32: invalid page number 5759 >>> On tree page 3878 cell 33: Failed to read ptrmap key=5761 >>> On tree page 3878 cell 33: invalid page number 5761 >>> On tree page 3878 cell 34: Failed to read ptrmap key=5763 >>> On tree page 3878 cell 34: invalid page number 5763 >>> On tree page 3878 cell 35: Failed to read ptrmap key=5767 >>> On tree page 3878 cell 35: invalid page number 5767 >>> On tree page 3878 cell 36: Failed to read ptrmap key=5769 >>> On tree page 3878 cell 36: invalid page number 5769 >>> On tree page 3878 cell 37: Failed to read ptrmap key=5771 >>> On tree page 3878 cell 37: invalid page number 5771 >>> On tree page 3878 cell 38: Failed to read ptrmap key=5773 >>> On tree page 3878 cell 38: invalid page number 5773 >>> On tree page 3878 cell 39: Failed to read ptrmap key=5775 >>> On tree page 3878 cell 39: invalid page number 5775 >>> On tree page 3878 cell 40: Failed to read ptrmap key=5777 >>> On tree page 3878 cell 40: invalid page number 5777 >>> On tree page 3878 cell 41: Failed to read ptrmap key=5780 >>> On tree page 3878 cell 41: invalid page number 5780 >>> On tree page 3878 cell 42: Failed to read ptrmap key=5783 >>> On tree page 3878 cell 42: invalid page number 5783 >>> On tree page 3878 cell 43: Failed to read ptrmap key=5787 >>> On tree page 3878 cell 43: invalid page number 5787 >>> On tree page 3878 cell 44: Failed to read ptrmap key=5789 >>> On tree page 3878 cell 44: invalid page number 5789 >>> On tree page 3878 cell 45: Failed to read ptrmap key=5793 >>> On tree page 3878 cell 45: invalid page number 5793 >>> On tree page 3878 cell 46: Failed to read ptrmap key=5795 >>> On tree page 3878 cell 46: invalid page number 5795 >>> On tree page 3878 cell 47: Failed to read ptrmap key=5797 >>> On tree page 3878 cell 47: invalid page number 5797 >>> On tree page 3878 cell 48: Failed to read ptrmap key=5801 >>> On tree page 3878 cell 48: invalid page number 5801 >>> On tree page 3878 cell 49: Failed to read ptrmap key=5805 >>> On tree page 3878 cell 49: invalid page number 5805 >>> On tree page 3878 cell 50: Failed to read ptrmap key=5807 >>> On tree page 3878 cell 50: invalid page number 5807 >>> On tree page 3878 cell 51: Failed to read ptrmap key=5810 >>> On tree page 3878 cell 51: invalid page number 5810 >>> On tree page 3878 cell 52: Failed to read ptrmap key=5813 >>> On tree page 3878 cell 52: invalid page number 5813 >>> On tree page 3878 cell 53: Failed to read ptrmap key=5817 >>> On tree page 3878 cell 53: invalid page number 5817 >>> On tree page 3878 cell 54: Failed to read ptrmap key=5822 >>> On tree page 3878 cell 54: invalid page number 5822 >>> On tree page 3878 cell 55: Failed to read ptrmap key=5826 >>> On tree page 3878 cell 55: invalid page number 5826 >>> On tree page 3878 cell 56: Failed to read ptrmap key=5831 >>> On tree page 3878 cell 56: invalid page number 5831 >>> On tree page 3878 cell 57: Failed to read ptrmap key=5836 >>> On tree page 3878 cell 57: invalid page number 5836 >>> On tree page 3878 cell 58: Failed to read ptrmap key=5838 >>> On tree page 3878 cell 58: invalid page number 5838 >>> On tree page 3878 cell 59: Failed to read ptrmap key=5841 >>> On tree page 3878 cell 59: invalid page number 5841 >>> On tree page 3878 cell 60: Failed to read ptrmap key=5843 >>> On tree page 3878 cell 60: invalid page number 5843 >>> On tree page 3878 cell 61: Failed to read ptrmap key=5846 >>> On tree page 3878 cell 61: invalid page number 5846 >>> On tree page 3878 cell 62: Failed to read ptrmap key=5848 >>> On tree page 3878 cell 62: invalid page number 5848 >>> On tree page 3878 cell 63: Failed to read ptrmap key=5850 >>> On tree page 3878 cell 63: invalid page number 5850 >>> On tree page 3878 cell 64: Failed to read ptrmap key=5853 >>> On tree page 3878 cell 64: invalid page number 5853 >>> On tree page 3878 cell 65: Failed to read ptrmap key=5857 >>> On tree page 3878 cell 65: invalid page number 5857 >>> On tree page 3878 cell 66: Failed to read ptrmap key=5859 >>> On tree page 3878 cell 66: invalid page number 5859 >>> On tree page 3878 cell 67: Failed to read ptrmap key=5862 >>> On tree page 3878 cell 67: invalid page number 5862 >>> On tree page 3878 cell 68: Failed to read ptrmap key=5868 >>> On tree page 3878 cell 68: invalid page number 5868 >>> On tree page 3878 cell 69: Failed to read ptrmap key=5870 >>> On tree page 3878 cell 69: invalid page number 5870 >>> On tree page 3878 cell 70: Failed to read ptrmap key=5874 >>> On tree page 3878 cell 70: invalid page number 5874 >>> On tree page 3878 cell 71: Failed to read ptrmap key=5876 >>> On tree page 3878 cell 71: invalid page number 5876 >>> On tree page 3878 cell 72: Bad ptr map entry key=4495 >>> expected=(5,3878) >>> got=(2,0 >>> ) >>> On tree page 4495 cell 0: Bad ptr map entry key=4492 >>> expected=(3,4495) >>> got=(2,0) >>> >>> On tree page 3878 cell 72: Child page depth differs >>> _______________________________________________ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> _______________________________________________ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users MC> _______________________________________________ MC> sqlite-users mailing list MC> sqlite-users@sqlite.org MC> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users MC> _______________________________________________ MC> sqlite-users mailing list MC> sqlite-users@sqlite.org MC> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- Best regards, Teg mailto:t...@djii.com _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users