Yes, if we are in the middle of a lot of updates/inserts and just
terminate the thread, pragma integrity_check from the sqlite3 command
line tool will report corruption at times.  Normally, when we hard kill
a thread in the middle of these ops, a journal is left behind.  I think
we only see corruption in this case (journal left behind), but cannot be
sure.  Our transactions can be large (a few thousand records totaling a
few megabytes).  

Summary of steps:

1) hard-Kill a thread in the middle of inserting/updating a large
transaction (2mb+ transaction).  
2) with nothing else running, do a pragma integrity_check in sqlite3
command line client against the db.  Obviously after executing sqlite3
client, the journal disappears since I guess recovery ran.
3) integrity_check spits out lots of errors.

Note that we have never had this happen with Synchronous=Full, only
Synchronous=Off.  Have not tried normal.  That's all I was trying to
tell the original person asking.  I don't know why this would happen; it
seems logical that once you execute a write to the FS, whether or not
the app crashes/spontaneously exits that the write would make it.
However, all of this may be a red herring.. It turns out we moved to
Synch=Full after we sped up our db operations greatly... see below as to
why that may be the factor (speed), and not what Synch is set to.

By the way, this is all under windows.

I'm assuming that sqlite3 uses sync i/o via writefile.  Here's a quick
read on sync IO for windows:
http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx --
specifically what worries me is that TerminateThread() looks like it can
interrupt an I/O operation.

The problem I think may be people using TerminateThread();  that's how
you hard kill a thread.  It seems that can interrupt an I/O operation --
ie an operation that writes more than one cluster at a time.  Meaning,
synch = full may have nothing to do with it.  If you have to say write
more than one cluster (disk block), TerminateThread looks like it can
abort the IO op in the middle of a multi-block op?  I'm trying to run
that down but can't yet find anything that verifies this. 

So, here is what I think:
1) You need to write some data, lets say a 8K write.  In this
theoretical example, disk blocks are 4k.  
2) You call WriteFile(8K).
3) WriteFile causes a system call, and the system schedules the first
bit of the i/o (1st 4k).
4) terminatethread() is called
5) I/O operation is cancelled (ie, as if CancelIO had been called?),
meaning block #2 was never scheduled.  Database now corrupt.

A lot of people using windows tend to have this kind of threading
architecture:
1) Signal the thread to exit on its own
2) Wait for some grace period.
3) if grace period expires, and thread is still running --
TerminateThread() because once the grace period expires, the thread is
considered hung.

So, large transactions in a thread could cause people to use
TerminateThread() at a critical time, especially if that causes the
thread to go over its grace period.  For us, these large transactions
took longer than our grace period to complete, and thus were subject to
TerminateThread -- and lots of corruption.  Once we sped everything up
and moved to synch=full, no transaction was even close to the grace
period, and such, terminatethread() is never called.. and we get no
corruption.

Just a thought.

-----Original Message-----
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy
Sent: Wednesday, October 14, 2009 12:36 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Corrupted database


On Oct 14, 2009, at 12:42 AM, McClellen, Chris wrote:

> But it does happen and we can reproduce it.  Hard killing a thread is
> essentially equivalent to turning off the power.

We have always assumed that it is different. When you write data to
a file, the write is buffered in volatile memory by the OS for a time.
If a power failure occurs during this time, the write is lost. But if
a thread is killed, the OS should still eventually make sure the data
gets to stable storage.

If you kill the application, then open the database using the shell
tool, is the database corrupted?

Dan.



> -----Original Message-----
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy
> Sent: Tuesday, October 13, 2009 12:35 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Corrupted database
>
>
> On Oct 13, 2009, at 1:56 AM, McClellen, Chris wrote:
>
>> What is your synchronous set to?  Full?  FYI If you are using .NET
>> data
>> providers, it is set to "Normal" by default.
>>
>> If it is not set to full, I have seen corruption when an application
>> crashes, or exits when a thread is in the middle of updating the db
>> (Synchronous = OFF makes corruption even easier in this case).  I  
>> have
>> seen apps that do not wait for background threads to finish before
>> termination, and without full sync on, either the db or the log gets
>> corrupted.  A corrupted log can cause problems for you db on next run
>> when recovery happens.
>
> In theory, this shouldn't happen. Unless the application is actually
> buffering data that SQLite thinks has been written to the database or
> journal file in the process space on some systems.
>
> The "synchronous" setting should only make a difference in the event
> of a power or OS failure. That's the theory, anyway.
>
> Dan.
>
>
>
>>
>>
>>
>> -----Original Message-----
>> From: sqlite-users-boun...@sqlite.org
>> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Filip Navara
>> Sent: Monday, October 12, 2009 12:38 PM
>> To: General Discussion of SQLite Database
>> Subject: [sqlite] Corrupted database
>>
>> Hello,
>>
>> for a few months we have been occasionally getting corrupted  
>> databases
>> in the field. So far we were unable to acquire any of them from our
>> customers, but this week I finally got hold of one. Output from
>> "pragma integrity_check" is included below.
>>
>> The schema is the following:
>>
>> CREATE VIRTUAL TABLE "LocalMailsIndex3" USING fts3 (id INTEGER,
>> partName TEXT, content TEXT);
>> CREATE TABLE LocalMailsIndex3_content(  docid INTEGER PRIMARY
>> KEY,c0id, c1partName, c2content);
>> CREATE TABLE LocalMailsIndex3_segdir(  level integer,  idx integer,
>> start_block integer,  leaves_end_block integer,  end_block integer,
>> root blob,  primary key(level, idx));
>> CREATE TABLE LocalMailsIndex3_segments(  blockid INTEGER PRIMARY KEY,
>> block blob);
>> CREATE INDEX "LocalMailsIndex3_contentIndex" ON
>> "LocalMailsIndex3_content" ("c0id", "c1partName");
>>
>> The database is created using SQLite 3.6.14.2, thread safe, on  
>> Windows
>> with auto_vacuum=incremental. It is always opened as attached  
>> database
>> with journal_mode=persist. Application crashes were most probably
>> involved, but no operating system / power crashes as far as I know.
>>
>> One thread in the application is periodically running "pragma
>> freelist_count" and "pragma incremental_vacuum(...)". Other threads
>> are running combination of the following commands and no other:
>>
>> INSERT INTO mail_fti.LocalMailsIndex3("id", "partName", "content")
>> VALUES (@id, @partName, @content)
>> SELECT c2content AS content FROM mail_fti.LocalMailsIndex3_content
>> WHERE c0...@id AND c1partna...@partname
>> SELECT docid FROM mail_fti.LocalMailsIndex3_content WHERE c0id IN
>> (...)
>> DELETE FROM mail_fti.LocalMailsIndex3 WHERE doc...@docid
>> SELECT ... WHERE "id" IN (SELECT "id" FROM mail_fti.LocalMailsIndex3
>> WHERE content MATCH "...")
>>
>> Anybody has seen something like this?
>> Anybody willing to look at it? I can send the database privately.
>>
>> Best regards,
>> Filip Navara
>>
>> Main freelist: Bad ptr map entry key=5143 expected=(2,0) got=(3,4467)
>> Main freelist: freelist leaf count too big on page 5143
>> Main freelist: Bad ptr map entry key=5449 expected=(2,0) got=(4,5143)
>> Main freelist: freelist leaf count too big on page 5449
>> Main freelist: 904 of 908 pages missing from overflow list starting  
>> at
>> 5143
>> On tree page 3878 cell 26: invalid page number 5737
>> On tree page 3878 cell 26: Child page depth differs
>> On tree page 3878 cell 27: Failed to read ptrmap key=5746
>> On tree page 3878 cell 27: invalid page number 5746
>> On tree page 3878 cell 28: Failed to read ptrmap key=5748
>> On tree page 3878 cell 28: invalid page number 5748
>> On tree page 3878 cell 29: Failed to read ptrmap key=5749
>> On tree page 3878 cell 29: invalid page number 5749
>> On tree page 3878 cell 30: Failed to read ptrmap key=5755
>> On tree page 3878 cell 30: invalid page number 5755
>> On tree page 3878 cell 31: Failed to read ptrmap key=5757
>> On tree page 3878 cell 31: invalid page number 5757
>> On tree page 3878 cell 32: Failed to read ptrmap key=5759
>> On tree page 3878 cell 32: invalid page number 5759
>> On tree page 3878 cell 33: Failed to read ptrmap key=5761
>> On tree page 3878 cell 33: invalid page number 5761
>> On tree page 3878 cell 34: Failed to read ptrmap key=5763
>> On tree page 3878 cell 34: invalid page number 5763
>> On tree page 3878 cell 35: Failed to read ptrmap key=5767
>> On tree page 3878 cell 35: invalid page number 5767
>> On tree page 3878 cell 36: Failed to read ptrmap key=5769
>> On tree page 3878 cell 36: invalid page number 5769
>> On tree page 3878 cell 37: Failed to read ptrmap key=5771
>> On tree page 3878 cell 37: invalid page number 5771
>> On tree page 3878 cell 38: Failed to read ptrmap key=5773
>> On tree page 3878 cell 38: invalid page number 5773
>> On tree page 3878 cell 39: Failed to read ptrmap key=5775
>> On tree page 3878 cell 39: invalid page number 5775
>> On tree page 3878 cell 40: Failed to read ptrmap key=5777
>> On tree page 3878 cell 40: invalid page number 5777
>> On tree page 3878 cell 41: Failed to read ptrmap key=5780
>> On tree page 3878 cell 41: invalid page number 5780
>> On tree page 3878 cell 42: Failed to read ptrmap key=5783
>> On tree page 3878 cell 42: invalid page number 5783
>> On tree page 3878 cell 43: Failed to read ptrmap key=5787
>> On tree page 3878 cell 43: invalid page number 5787
>> On tree page 3878 cell 44: Failed to read ptrmap key=5789
>> On tree page 3878 cell 44: invalid page number 5789
>> On tree page 3878 cell 45: Failed to read ptrmap key=5793
>> On tree page 3878 cell 45: invalid page number 5793
>> On tree page 3878 cell 46: Failed to read ptrmap key=5795
>> On tree page 3878 cell 46: invalid page number 5795
>> On tree page 3878 cell 47: Failed to read ptrmap key=5797
>> On tree page 3878 cell 47: invalid page number 5797
>> On tree page 3878 cell 48: Failed to read ptrmap key=5801
>> On tree page 3878 cell 48: invalid page number 5801
>> On tree page 3878 cell 49: Failed to read ptrmap key=5805
>> On tree page 3878 cell 49: invalid page number 5805
>> On tree page 3878 cell 50: Failed to read ptrmap key=5807
>> On tree page 3878 cell 50: invalid page number 5807
>> On tree page 3878 cell 51: Failed to read ptrmap key=5810
>> On tree page 3878 cell 51: invalid page number 5810
>> On tree page 3878 cell 52: Failed to read ptrmap key=5813
>> On tree page 3878 cell 52: invalid page number 5813
>> On tree page 3878 cell 53: Failed to read ptrmap key=5817
>> On tree page 3878 cell 53: invalid page number 5817
>> On tree page 3878 cell 54: Failed to read ptrmap key=5822
>> On tree page 3878 cell 54: invalid page number 5822
>> On tree page 3878 cell 55: Failed to read ptrmap key=5826
>> On tree page 3878 cell 55: invalid page number 5826
>> On tree page 3878 cell 56: Failed to read ptrmap key=5831
>> On tree page 3878 cell 56: invalid page number 5831
>> On tree page 3878 cell 57: Failed to read ptrmap key=5836
>> On tree page 3878 cell 57: invalid page number 5836
>> On tree page 3878 cell 58: Failed to read ptrmap key=5838
>> On tree page 3878 cell 58: invalid page number 5838
>> On tree page 3878 cell 59: Failed to read ptrmap key=5841
>> On tree page 3878 cell 59: invalid page number 5841
>> On tree page 3878 cell 60: Failed to read ptrmap key=5843
>> On tree page 3878 cell 60: invalid page number 5843
>> On tree page 3878 cell 61: Failed to read ptrmap key=5846
>> On tree page 3878 cell 61: invalid page number 5846
>> On tree page 3878 cell 62: Failed to read ptrmap key=5848
>> On tree page 3878 cell 62: invalid page number 5848
>> On tree page 3878 cell 63: Failed to read ptrmap key=5850
>> On tree page 3878 cell 63: invalid page number 5850
>> On tree page 3878 cell 64: Failed to read ptrmap key=5853
>> On tree page 3878 cell 64: invalid page number 5853
>> On tree page 3878 cell 65: Failed to read ptrmap key=5857
>> On tree page 3878 cell 65: invalid page number 5857
>> On tree page 3878 cell 66: Failed to read ptrmap key=5859
>> On tree page 3878 cell 66: invalid page number 5859
>> On tree page 3878 cell 67: Failed to read ptrmap key=5862
>> On tree page 3878 cell 67: invalid page number 5862
>> On tree page 3878 cell 68: Failed to read ptrmap key=5868
>> On tree page 3878 cell 68: invalid page number 5868
>> On tree page 3878 cell 69: Failed to read ptrmap key=5870
>> On tree page 3878 cell 69: invalid page number 5870
>> On tree page 3878 cell 70: Failed to read ptrmap key=5874
>> On tree page 3878 cell 70: invalid page number 5874
>> On tree page 3878 cell 71: Failed to read ptrmap key=5876
>> On tree page 3878 cell 71: invalid page number 5876
>> On tree page 3878 cell 72: Bad ptr map entry key=4495
>> expected=(5,3878)
>> got=(2,0
>> )
>> On tree page 4495 cell 0: Bad ptr map entry key=4492  
>> expected=(3,4495)
>> got=(2,0)
>>
>> On tree page 3878 cell 72: Child page depth differs
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to