Yes, I agree.  What I am now trying to find out is if things like
running a service or .NET service causes terminatethread to be called
behind the scenes as some kind of cleanup.  The testing was to show that
this can be a problem, to characterize why some dbs can get corrupted on
"normal exits"



-----Original Message-----
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Teg
Sent: Wednesday, October 14, 2009 12:10 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Corrupted database

Hello Chris,

It's always a mistake to use TerminateThread. Even Microsoft warns
against it.

>From MSDN:

- If the target thread owns a critical section, the critical section
will not be released.
- If the target thread is allocating memory from the heap, the heap lock
will not be released.
- If the target thread is executing certain kernel32 calls when it is
terminated, the kernel32 state for the thread's process could be
inconsistent.
- If the target thread is manipulating the global state of a shared DLL,
the state of the DLL could be destroyed, affecting other users of the
DLL.

Terminating a thread can hang the whole application of I read #1 and #2
correctly. If you use TerminateThread regularly, you should really
re-consider your design.

C


Wednesday, October 14, 2009, 11:29:29 AM, you wrote:

MC> Yes, if we are in the middle of a lot of updates/inserts and just
MC> terminate the thread, pragma integrity_check from the sqlite3
command
MC> line tool will report corruption at times.  Normally, when we hard
kill
MC> a thread in the middle of these ops, a journal is left behind.  I
think
MC> we only see corruption in this case (journal left behind), but
cannot be
MC> sure.  Our transactions can be large (a few thousand records
totaling a
MC> few megabytes).  

MC> Summary of steps:

MC> 1) hard-Kill a thread in the middle of inserting/updating a large
MC> transaction (2mb+ transaction).  
MC> 2) with nothing else running, do a pragma integrity_check in sqlite3
MC> command line client against the db.  Obviously after executing
sqlite3
MC> client, the journal disappears since I guess recovery ran.
MC> 3) integrity_check spits out lots of errors.

MC> Note that we have never had this happen with Synchronous=Full, only
MC> Synchronous=Off.  Have not tried normal.  That's all I was trying to
MC> tell the original person asking.  I don't know why this would
happen; it
MC> seems logical that once you execute a write to the FS, whether or
not
MC> the app crashes/spontaneously exits that the write would make it.
MC> However, all of this may be a red herring.. It turns out we moved to
MC> Synch=Full after we sped up our db operations greatly... see below
as to
MC> why that may be the factor (speed), and not what Synch is set to.

MC> By the way, this is all under windows.

MC> I'm assuming that sqlite3 uses sync i/o via writefile.  Here's a
quick
MC> read on sync IO for windows:
MC> http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx --
MC> specifically what worries me is that TerminateThread() looks like it
can
MC> interrupt an I/O operation.

MC> The problem I think may be people using TerminateThread();  that's
how
MC> you hard kill a thread.  It seems that can interrupt an I/O
operation --
MC> ie an operation that writes more than one cluster at a time.
Meaning,
MC> synch = full may have nothing to do with it.  If you have to say
write
MC> more than one cluster (disk block), TerminateThread looks like it
can
MC> abort the IO op in the middle of a multi-block op?  I'm trying to
run
MC> that down but can't yet find anything that verifies this. 

MC> So, here is what I think:
MC> 1) You need to write some data, lets say a 8K write.  In this
MC> theoretical example, disk blocks are 4k.  
MC> 2) You call WriteFile(8K).
MC> 3) WriteFile causes a system call, and the system schedules the
first
MC> bit of the i/o (1st 4k).
MC> 4) terminatethread() is called
MC> 5) I/O operation is cancelled (ie, as if CancelIO had been called?),
MC> meaning block #2 was never scheduled.  Database now corrupt.

MC> A lot of people using windows tend to have this kind of threading
MC> architecture:
MC> 1) Signal the thread to exit on its own
MC> 2) Wait for some grace period.
MC> 3) if grace period expires, and thread is still running --
MC> TerminateThread() because once the grace period expires, the thread
is
MC> considered hung.

MC> So, large transactions in a thread could cause people to use
MC> TerminateThread() at a critical time, especially if that causes the
MC> thread to go over its grace period.  For us, these large
transactions
MC> took longer than our grace period to complete, and thus were subject
to
MC> TerminateThread -- and lots of corruption.  Once we sped everything
up
MC> and moved to synch=full, no transaction was even close to the grace
MC> period, and such, terminatethread() is never called.. and we get no
MC> corruption.

MC> Just a thought.

MC> -----Original Message-----
MC> From: sqlite-users-boun...@sqlite.org
MC> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy
MC> Sent: Wednesday, October 14, 2009 12:36 AM
MC> To: General Discussion of SQLite Database
MC> Subject: Re: [sqlite] Corrupted database


MC> On Oct 14, 2009, at 12:42 AM, McClellen, Chris wrote:

>> But it does happen and we can reproduce it.  Hard killing a thread is
>> essentially equivalent to turning off the power.

MC> We have always assumed that it is different. When you write data to
MC> a file, the write is buffered in volatile memory by the OS for a
time.
MC> If a power failure occurs during this time, the write is lost. But
if
MC> a thread is killed, the OS should still eventually make sure the
data
MC> gets to stable storage.

MC> If you kill the application, then open the database using the shell
MC> tool, is the database corrupted?

MC> Dan.



>> -----Original Message-----
>> From: sqlite-users-boun...@sqlite.org
>> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy
>> Sent: Tuesday, October 13, 2009 12:35 AM
>> To: General Discussion of SQLite Database
>> Subject: Re: [sqlite] Corrupted database
>>
>>
>> On Oct 13, 2009, at 1:56 AM, McClellen, Chris wrote:
>>
>>> What is your synchronous set to?  Full?  FYI If you are using .NET
>>> data
>>> providers, it is set to "Normal" by default.
>>>
>>> If it is not set to full, I have seen corruption when an application
>>> crashes, or exits when a thread is in the middle of updating the db
>>> (Synchronous = OFF makes corruption even easier in this case).  I  
>>> have
>>> seen apps that do not wait for background threads to finish before
>>> termination, and without full sync on, either the db or the log gets
>>> corrupted.  A corrupted log can cause problems for you db on next
run
>>> when recovery happens.
>>
>> In theory, this shouldn't happen. Unless the application is actually
>> buffering data that SQLite thinks has been written to the database or
>> journal file in the process space on some systems.
>>
>> The "synchronous" setting should only make a difference in the event
>> of a power or OS failure. That's the theory, anyway.
>>
>> Dan.
>>
>>
>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: sqlite-users-boun...@sqlite.org
>>> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Filip Navara
>>> Sent: Monday, October 12, 2009 12:38 PM
>>> To: General Discussion of SQLite Database
>>> Subject: [sqlite] Corrupted database
>>>
>>> Hello,
>>>
>>> for a few months we have been occasionally getting corrupted  
>>> databases
>>> in the field. So far we were unable to acquire any of them from our
>>> customers, but this week I finally got hold of one. Output from
>>> "pragma integrity_check" is included below.
>>>
>>> The schema is the following:
>>>
>>> CREATE VIRTUAL TABLE "LocalMailsIndex3" USING fts3 (id INTEGER,
>>> partName TEXT, content TEXT);
>>> CREATE TABLE LocalMailsIndex3_content(  docid INTEGER PRIMARY
>>> KEY,c0id, c1partName, c2content);
>>> CREATE TABLE LocalMailsIndex3_segdir(  level integer,  idx integer,
>>> start_block integer,  leaves_end_block integer,  end_block integer,
>>> root blob,  primary key(level, idx));
>>> CREATE TABLE LocalMailsIndex3_segments(  blockid INTEGER PRIMARY
KEY,
>>> block blob);
>>> CREATE INDEX "LocalMailsIndex3_contentIndex" ON
>>> "LocalMailsIndex3_content" ("c0id", "c1partName");
>>>
>>> The database is created using SQLite 3.6.14.2, thread safe, on  
>>> Windows
>>> with auto_vacuum=incremental. It is always opened as attached  
>>> database
>>> with journal_mode=persist. Application crashes were most probably
>>> involved, but no operating system / power crashes as far as I know.
>>>
>>> One thread in the application is periodically running "pragma
>>> freelist_count" and "pragma incremental_vacuum(...)". Other threads
>>> are running combination of the following commands and no other:
>>>
>>> INSERT INTO mail_fti.LocalMailsIndex3("id", "partName", "content")
>>> VALUES (@id, @partName, @content)
>>> SELECT c2content AS content FROM mail_fti.LocalMailsIndex3_content
>>> WHERE c0...@id AND c1partna...@partname
>>> SELECT docid FROM mail_fti.LocalMailsIndex3_content WHERE c0id IN
>>> (...)
>>> DELETE FROM mail_fti.LocalMailsIndex3 WHERE doc...@docid
>>> SELECT ... WHERE "id" IN (SELECT "id" FROM mail_fti.LocalMailsIndex3
>>> WHERE content MATCH "...")
>>>
>>> Anybody has seen something like this?
>>> Anybody willing to look at it? I can send the database privately.
>>>
>>> Best regards,
>>> Filip Navara
>>>
>>> Main freelist: Bad ptr map entry key=5143 expected=(2,0)
got=(3,4467)
>>> Main freelist: freelist leaf count too big on page 5143
>>> Main freelist: Bad ptr map entry key=5449 expected=(2,0)
got=(4,5143)
>>> Main freelist: freelist leaf count too big on page 5449
>>> Main freelist: 904 of 908 pages missing from overflow list starting

>>> at
>>> 5143
>>> On tree page 3878 cell 26: invalid page number 5737
>>> On tree page 3878 cell 26: Child page depth differs
>>> On tree page 3878 cell 27: Failed to read ptrmap key=5746
>>> On tree page 3878 cell 27: invalid page number 5746
>>> On tree page 3878 cell 28: Failed to read ptrmap key=5748
>>> On tree page 3878 cell 28: invalid page number 5748
>>> On tree page 3878 cell 29: Failed to read ptrmap key=5749
>>> On tree page 3878 cell 29: invalid page number 5749
>>> On tree page 3878 cell 30: Failed to read ptrmap key=5755
>>> On tree page 3878 cell 30: invalid page number 5755
>>> On tree page 3878 cell 31: Failed to read ptrmap key=5757
>>> On tree page 3878 cell 31: invalid page number 5757
>>> On tree page 3878 cell 32: Failed to read ptrmap key=5759
>>> On tree page 3878 cell 32: invalid page number 5759
>>> On tree page 3878 cell 33: Failed to read ptrmap key=5761
>>> On tree page 3878 cell 33: invalid page number 5761
>>> On tree page 3878 cell 34: Failed to read ptrmap key=5763
>>> On tree page 3878 cell 34: invalid page number 5763
>>> On tree page 3878 cell 35: Failed to read ptrmap key=5767
>>> On tree page 3878 cell 35: invalid page number 5767
>>> On tree page 3878 cell 36: Failed to read ptrmap key=5769
>>> On tree page 3878 cell 36: invalid page number 5769
>>> On tree page 3878 cell 37: Failed to read ptrmap key=5771
>>> On tree page 3878 cell 37: invalid page number 5771
>>> On tree page 3878 cell 38: Failed to read ptrmap key=5773
>>> On tree page 3878 cell 38: invalid page number 5773
>>> On tree page 3878 cell 39: Failed to read ptrmap key=5775
>>> On tree page 3878 cell 39: invalid page number 5775
>>> On tree page 3878 cell 40: Failed to read ptrmap key=5777
>>> On tree page 3878 cell 40: invalid page number 5777
>>> On tree page 3878 cell 41: Failed to read ptrmap key=5780
>>> On tree page 3878 cell 41: invalid page number 5780
>>> On tree page 3878 cell 42: Failed to read ptrmap key=5783
>>> On tree page 3878 cell 42: invalid page number 5783
>>> On tree page 3878 cell 43: Failed to read ptrmap key=5787
>>> On tree page 3878 cell 43: invalid page number 5787
>>> On tree page 3878 cell 44: Failed to read ptrmap key=5789
>>> On tree page 3878 cell 44: invalid page number 5789
>>> On tree page 3878 cell 45: Failed to read ptrmap key=5793
>>> On tree page 3878 cell 45: invalid page number 5793
>>> On tree page 3878 cell 46: Failed to read ptrmap key=5795
>>> On tree page 3878 cell 46: invalid page number 5795
>>> On tree page 3878 cell 47: Failed to read ptrmap key=5797
>>> On tree page 3878 cell 47: invalid page number 5797
>>> On tree page 3878 cell 48: Failed to read ptrmap key=5801
>>> On tree page 3878 cell 48: invalid page number 5801
>>> On tree page 3878 cell 49: Failed to read ptrmap key=5805
>>> On tree page 3878 cell 49: invalid page number 5805
>>> On tree page 3878 cell 50: Failed to read ptrmap key=5807
>>> On tree page 3878 cell 50: invalid page number 5807
>>> On tree page 3878 cell 51: Failed to read ptrmap key=5810
>>> On tree page 3878 cell 51: invalid page number 5810
>>> On tree page 3878 cell 52: Failed to read ptrmap key=5813
>>> On tree page 3878 cell 52: invalid page number 5813
>>> On tree page 3878 cell 53: Failed to read ptrmap key=5817
>>> On tree page 3878 cell 53: invalid page number 5817
>>> On tree page 3878 cell 54: Failed to read ptrmap key=5822
>>> On tree page 3878 cell 54: invalid page number 5822
>>> On tree page 3878 cell 55: Failed to read ptrmap key=5826
>>> On tree page 3878 cell 55: invalid page number 5826
>>> On tree page 3878 cell 56: Failed to read ptrmap key=5831
>>> On tree page 3878 cell 56: invalid page number 5831
>>> On tree page 3878 cell 57: Failed to read ptrmap key=5836
>>> On tree page 3878 cell 57: invalid page number 5836
>>> On tree page 3878 cell 58: Failed to read ptrmap key=5838
>>> On tree page 3878 cell 58: invalid page number 5838
>>> On tree page 3878 cell 59: Failed to read ptrmap key=5841
>>> On tree page 3878 cell 59: invalid page number 5841
>>> On tree page 3878 cell 60: Failed to read ptrmap key=5843
>>> On tree page 3878 cell 60: invalid page number 5843
>>> On tree page 3878 cell 61: Failed to read ptrmap key=5846
>>> On tree page 3878 cell 61: invalid page number 5846
>>> On tree page 3878 cell 62: Failed to read ptrmap key=5848
>>> On tree page 3878 cell 62: invalid page number 5848
>>> On tree page 3878 cell 63: Failed to read ptrmap key=5850
>>> On tree page 3878 cell 63: invalid page number 5850
>>> On tree page 3878 cell 64: Failed to read ptrmap key=5853
>>> On tree page 3878 cell 64: invalid page number 5853
>>> On tree page 3878 cell 65: Failed to read ptrmap key=5857
>>> On tree page 3878 cell 65: invalid page number 5857
>>> On tree page 3878 cell 66: Failed to read ptrmap key=5859
>>> On tree page 3878 cell 66: invalid page number 5859
>>> On tree page 3878 cell 67: Failed to read ptrmap key=5862
>>> On tree page 3878 cell 67: invalid page number 5862
>>> On tree page 3878 cell 68: Failed to read ptrmap key=5868
>>> On tree page 3878 cell 68: invalid page number 5868
>>> On tree page 3878 cell 69: Failed to read ptrmap key=5870
>>> On tree page 3878 cell 69: invalid page number 5870
>>> On tree page 3878 cell 70: Failed to read ptrmap key=5874
>>> On tree page 3878 cell 70: invalid page number 5874
>>> On tree page 3878 cell 71: Failed to read ptrmap key=5876
>>> On tree page 3878 cell 71: invalid page number 5876
>>> On tree page 3878 cell 72: Bad ptr map entry key=4495
>>> expected=(5,3878)
>>> got=(2,0
>>> )
>>> On tree page 4495 cell 0: Bad ptr map entry key=4492  
>>> expected=(3,4495)
>>> got=(2,0)
>>>
>>> On tree page 3878 cell 72: Child page depth differs
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

MC> _______________________________________________
MC> sqlite-users mailing list
MC> sqlite-users@sqlite.org
MC> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
MC> _______________________________________________
MC> sqlite-users mailing list
MC> sqlite-users@sqlite.org
MC> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



-- 
Best regards,
 Teg                            mailto:t...@djii.com

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to