> The problem I think may be people using TerminateThread();  that's how
> you hard kill a thread.  It seems that can interrupt an I/O  
> operation --
> ie an operation that writes more than one cluster at a time.  Meaning,
> synch = full may have nothing to do with it.  If you have to say write
> more than one cluster (disk block), TerminateThread looks like it can
> abort the IO op in the middle of a multi-block op?  I'm trying to run
> that down but can't yet find anything that verifies this.

Even if it does, which seems quite plausible, the only way
I can see this causing corruption is if you are in persistent
journal mode and a (weak) checksum gives you a false positive
on the last, corrupted, record in the journal file.

Maybe if you're using a version from a few years back it could
happen without persistent-journal mode too.

It's easy to be wrong about this kind of thing though.

Dan.



> So, here is what I think:
> 1) You need to write some data, lets say a 8K write.  In this
> theoretical example, disk blocks are 4k.
> 2) You call WriteFile(8K).
> 3) WriteFile causes a system call, and the system schedules the first
> bit of the i/o (1st 4k).
> 4) terminatethread() is called
> 5) I/O operation is cancelled (ie, as if CancelIO had been called?),
> meaning block #2 was never scheduled.  Database now corrupt.
>
> A lot of people using windows tend to have this kind of threading
> architecture:
> 1) Signal the thread to exit on its own
> 2) Wait for some grace period.
> 3) if grace period expires, and thread is still running --
> TerminateThread() because once the grace period expires, the thread is
> considered hung.
>
> So, large transactions in a thread could cause people to use
> TerminateThread() at a critical time, especially if that causes the
> thread to go over its grace period.  For us, these large transactions
> took longer than our grace period to complete, and thus were subject  
> to
> TerminateThread -- and lots of corruption.  Once we sped everything up
> and moved to synch=full, no transaction was even close to the grace
> period, and such, terminatethread() is never called.. and we get no
> corruption.
>
> Just a thought.
>
> -----Original Message-----
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy
> Sent: Wednesday, October 14, 2009 12:36 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Corrupted database
>
>
> On Oct 14, 2009, at 12:42 AM, McClellen, Chris wrote:
>
>> But it does happen and we can reproduce it.  Hard killing a thread is
>> essentially equivalent to turning off the power.
>
> We have always assumed that it is different. When you write data to
> a file, the write is buffered in volatile memory by the OS for a time.
> If a power failure occurs during this time, the write is lost. But if
> a thread is killed, the OS should still eventually make sure the data
> gets to stable storage.
>
> If you kill the application, then open the database using the shell
> tool, is the database corrupted?
>
> Dan.
>
>
>
>> -----Original Message-----
>> From: sqlite-users-boun...@sqlite.org
>> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy
>> Sent: Tuesday, October 13, 2009 12:35 AM
>> To: General Discussion of SQLite Database
>> Subject: Re: [sqlite] Corrupted database
>>
>>
>> On Oct 13, 2009, at 1:56 AM, McClellen, Chris wrote:
>>
>>> What is your synchronous set to?  Full?  FYI If you are using .NET
>>> data
>>> providers, it is set to "Normal" by default.
>>>
>>> If it is not set to full, I have seen corruption when an application
>>> crashes, or exits when a thread is in the middle of updating the db
>>> (Synchronous = OFF makes corruption even easier in this case).  I
>>> have
>>> seen apps that do not wait for background threads to finish before
>>> termination, and without full sync on, either the db or the log gets
>>> corrupted.  A corrupted log can cause problems for you db on next  
>>> run
>>> when recovery happens.
>>
>> In theory, this shouldn't happen. Unless the application is actually
>> buffering data that SQLite thinks has been written to the database or
>> journal file in the process space on some systems.
>>
>> The "synchronous" setting should only make a difference in the event
>> of a power or OS failure. That's the theory, anyway.
>>
>> Dan.
>>
>>
>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: sqlite-users-boun...@sqlite.org
>>> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Filip Navara
>>> Sent: Monday, October 12, 2009 12:38 PM
>>> To: General Discussion of SQLite Database
>>> Subject: [sqlite] Corrupted database
>>>
>>> Hello,
>>>
>>> for a few months we have been occasionally getting corrupted
>>> databases
>>> in the field. So far we were unable to acquire any of them from our
>>> customers, but this week I finally got hold of one. Output from
>>> "pragma integrity_check" is included below.
>>>
>>> The schema is the following:
>>>
>>> CREATE VIRTUAL TABLE "LocalMailsIndex3" USING fts3 (id INTEGER,
>>> partName TEXT, content TEXT);
>>> CREATE TABLE LocalMailsIndex3_content(  docid INTEGER PRIMARY
>>> KEY,c0id, c1partName, c2content);
>>> CREATE TABLE LocalMailsIndex3_segdir(  level integer,  idx integer,
>>> start_block integer,  leaves_end_block integer,  end_block integer,
>>> root blob,  primary key(level, idx));
>>> CREATE TABLE LocalMailsIndex3_segments(  blockid INTEGER PRIMARY  
>>> KEY,
>>> block blob);
>>> CREATE INDEX "LocalMailsIndex3_contentIndex" ON
>>> "LocalMailsIndex3_content" ("c0id", "c1partName");
>>>
>>> The database is created using SQLite 3.6.14.2, thread safe, on
>>> Windows
>>> with auto_vacuum=incremental. It is always opened as attached
>>> database
>>> with journal_mode=persist. Application crashes were most probably
>>> involved, but no operating system / power crashes as far as I know.
>>>
>>> One thread in the application is periodically running "pragma
>>> freelist_count" and "pragma incremental_vacuum(...)". Other threads
>>> are running combination of the following commands and no other:
>>>
>>> INSERT INTO mail_fti.LocalMailsIndex3("id", "partName", "content")
>>> VALUES (@id, @partName, @content)
>>> SELECT c2content AS content FROM mail_fti.LocalMailsIndex3_content
>>> WHERE c0...@id AND c1partna...@partname
>>> SELECT docid FROM mail_fti.LocalMailsIndex3_content WHERE c0id IN
>>> (...)
>>> DELETE FROM mail_fti.LocalMailsIndex3 WHERE doc...@docid
>>> SELECT ... WHERE "id" IN (SELECT "id" FROM mail_fti.LocalMailsIndex3
>>> WHERE content MATCH "...")
>>>
>>> Anybody has seen something like this?
>>> Anybody willing to look at it? I can send the database privately.
>>>
>>> Best regards,
>>> Filip Navara
>>>
>>> Main freelist: Bad ptr map entry key=5143 expected=(2,0)  
>>> got=(3,4467)
>>> Main freelist: freelist leaf count too big on page 5143
>>> Main freelist: Bad ptr map entry key=5449 expected=(2,0)  
>>> got=(4,5143)
>>> Main freelist: freelist leaf count too big on page 5449
>>> Main freelist: 904 of 908 pages missing from overflow list starting
>>> at
>>> 5143
>>> On tree page 3878 cell 26: invalid page number 5737
>>> On tree page 3878 cell 26: Child page depth differs
>>> On tree page 3878 cell 27: Failed to read ptrmap key=5746
>>> On tree page 3878 cell 27: invalid page number 5746
>>> On tree page 3878 cell 28: Failed to read ptrmap key=5748
>>> On tree page 3878 cell 28: invalid page number 5748
>>> On tree page 3878 cell 29: Failed to read ptrmap key=5749
>>> On tree page 3878 cell 29: invalid page number 5749
>>> On tree page 3878 cell 30: Failed to read ptrmap key=5755
>>> On tree page 3878 cell 30: invalid page number 5755
>>> On tree page 3878 cell 31: Failed to read ptrmap key=5757
>>> On tree page 3878 cell 31: invalid page number 5757
>>> On tree page 3878 cell 32: Failed to read ptrmap key=5759
>>> On tree page 3878 cell 32: invalid page number 5759
>>> On tree page 3878 cell 33: Failed to read ptrmap key=5761
>>> On tree page 3878 cell 33: invalid page number 5761
>>> On tree page 3878 cell 34: Failed to read ptrmap key=5763
>>> On tree page 3878 cell 34: invalid page number 5763
>>> On tree page 3878 cell 35: Failed to read ptrmap key=5767
>>> On tree page 3878 cell 35: invalid page number 5767
>>> On tree page 3878 cell 36: Failed to read ptrmap key=5769
>>> On tree page 3878 cell 36: invalid page number 5769
>>> On tree page 3878 cell 37: Failed to read ptrmap key=5771
>>> On tree page 3878 cell 37: invalid page number 5771
>>> On tree page 3878 cell 38: Failed to read ptrmap key=5773
>>> On tree page 3878 cell 38: invalid page number 5773
>>> On tree page 3878 cell 39: Failed to read ptrmap key=5775
>>> On tree page 3878 cell 39: invalid page number 5775
>>> On tree page 3878 cell 40: Failed to read ptrmap key=5777
>>> On tree page 3878 cell 40: invalid page number 5777
>>> On tree page 3878 cell 41: Failed to read ptrmap key=5780
>>> On tree page 3878 cell 41: invalid page number 5780
>>> On tree page 3878 cell 42: Failed to read ptrmap key=5783
>>> On tree page 3878 cell 42: invalid page number 5783
>>> On tree page 3878 cell 43: Failed to read ptrmap key=5787
>>> On tree page 3878 cell 43: invalid page number 5787
>>> On tree page 3878 cell 44: Failed to read ptrmap key=5789
>>> On tree page 3878 cell 44: invalid page number 5789
>>> On tree page 3878 cell 45: Failed to read ptrmap key=5793
>>> On tree page 3878 cell 45: invalid page number 5793
>>> On tree page 3878 cell 46: Failed to read ptrmap key=5795
>>> On tree page 3878 cell 46: invalid page number 5795
>>> On tree page 3878 cell 47: Failed to read ptrmap key=5797
>>> On tree page 3878 cell 47: invalid page number 5797
>>> On tree page 3878 cell 48: Failed to read ptrmap key=5801
>>> On tree page 3878 cell 48: invalid page number 5801
>>> On tree page 3878 cell 49: Failed to read ptrmap key=5805
>>> On tree page 3878 cell 49: invalid page number 5805
>>> On tree page 3878 cell 50: Failed to read ptrmap key=5807
>>> On tree page 3878 cell 50: invalid page number 5807
>>> On tree page 3878 cell 51: Failed to read ptrmap key=5810
>>> On tree page 3878 cell 51: invalid page number 5810
>>> On tree page 3878 cell 52: Failed to read ptrmap key=5813
>>> On tree page 3878 cell 52: invalid page number 5813
>>> On tree page 3878 cell 53: Failed to read ptrmap key=5817
>>> On tree page 3878 cell 53: invalid page number 5817
>>> On tree page 3878 cell 54: Failed to read ptrmap key=5822
>>> On tree page 3878 cell 54: invalid page number 5822
>>> On tree page 3878 cell 55: Failed to read ptrmap key=5826
>>> On tree page 3878 cell 55: invalid page number 5826
>>> On tree page 3878 cell 56: Failed to read ptrmap key=5831
>>> On tree page 3878 cell 56: invalid page number 5831
>>> On tree page 3878 cell 57: Failed to read ptrmap key=5836
>>> On tree page 3878 cell 57: invalid page number 5836
>>> On tree page 3878 cell 58: Failed to read ptrmap key=5838
>>> On tree page 3878 cell 58: invalid page number 5838
>>> On tree page 3878 cell 59: Failed to read ptrmap key=5841
>>> On tree page 3878 cell 59: invalid page number 5841
>>> On tree page 3878 cell 60: Failed to read ptrmap key=5843
>>> On tree page 3878 cell 60: invalid page number 5843
>>> On tree page 3878 cell 61: Failed to read ptrmap key=5846
>>> On tree page 3878 cell 61: invalid page number 5846
>>> On tree page 3878 cell 62: Failed to read ptrmap key=5848
>>> On tree page 3878 cell 62: invalid page number 5848
>>> On tree page 3878 cell 63: Failed to read ptrmap key=5850
>>> On tree page 3878 cell 63: invalid page number 5850
>>> On tree page 3878 cell 64: Failed to read ptrmap key=5853
>>> On tree page 3878 cell 64: invalid page number 5853
>>> On tree page 3878 cell 65: Failed to read ptrmap key=5857
>>> On tree page 3878 cell 65: invalid page number 5857
>>> On tree page 3878 cell 66: Failed to read ptrmap key=5859
>>> On tree page 3878 cell 66: invalid page number 5859
>>> On tree page 3878 cell 67: Failed to read ptrmap key=5862
>>> On tree page 3878 cell 67: invalid page number 5862
>>> On tree page 3878 cell 68: Failed to read ptrmap key=5868
>>> On tree page 3878 cell 68: invalid page number 5868
>>> On tree page 3878 cell 69: Failed to read ptrmap key=5870
>>> On tree page 3878 cell 69: invalid page number 5870
>>> On tree page 3878 cell 70: Failed to read ptrmap key=5874
>>> On tree page 3878 cell 70: invalid page number 5874
>>> On tree page 3878 cell 71: Failed to read ptrmap key=5876
>>> On tree page 3878 cell 71: invalid page number 5876
>>> On tree page 3878 cell 72: Bad ptr map entry key=4495
>>> expected=(5,3878)
>>> got=(2,0
>>> )
>>> On tree page 4495 cell 0: Bad ptr map entry key=4492
>>> expected=(3,4495)
>>> got=(2,0)
>>>
>>> On tree page 3878 cell 72: Child page depth differs
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to