On 09/28/2012 03:32 AM, Jonathan Engle wrote:
I've been picking away at this for the last few days and have it
narrowed down fairly well.

It looks like if I turn off shared cache, it works fine (same
application code).

If I run with SQL_DEBUG enabled, the first issue I run into in an
assertion in sqlite3BtreeEnter: assert(
sqlite3_mutex_held(p->db->mutex) ); The call stack from it is

sqlite3BackupUpdate backupOnePage sqlite3BtreeGetReserve(p->pSrc)
sqlite3BtreeEnter

Look up the stack, it looks like sqlite3BackupUpdate locks the mutex
on the destination database but not the source.

Say you have an active backup operation (one created by backup_init()
but not yet completed) using source database handle db1. In
non-shared-cache mode. The backup is half-way done - 50% of the source
database pages have been copied to the destination.

If the source db is written by another process at this point, or using
a database handle other than db1, the backup operation has to start over
from the beginning on the next call to sqlite3_backup_step().

However, if the app writes to the source database using handle
db1, SQLite will automatically update the backup database as well. So
that the backup operation doesn't have to restart. That's the call to
sqlite3BackupUpdate() above. As you say, the code assumes that the
mutex on the source database handle (i.e. db1) is already held.

Turns out that this assumption is only true in non-shared-cache mode.
Because of the way the code is structured, in shared-cache mode, this
call to sqlite3BackupUpdate() will be made even if the source database
is updated using a second database handle - db2. But the backup code
still calls routines that assume the db1 mutex is held... Bug.

In the deadlock scenario, all the threads are blocked in
lockBtreeMutex(). This routine is supposed to prevent deadlock
by ensuring that mutexes are only obtained in a globally defined
order. But that could malfunction in unpredictable ways if two threads
were running the lockBtreeMutex() code on behalf of the same database
connection simultaneously. The mutex on the database handle is
supposed to prevent that from happening, but since the bug above
allows lockBtreeMutex() to be called without actually holding
the mutex it easily might.

I think the fix will likely be to have shared-cache mode work like
non-shared-cache mode - force the backup to start over if the source
database is written via a second database handle (i.e. db2).

Dan.





Tried as a test adding locking the source db, bad results. Altered
the definition of asserts to make them not fatal, got a ton of
assertions then deadlocking again.

Haven't tried to make a sample program yet, but the gist of it would
be to have one (or more threads) doing lots of small transactions
updating the database while simultaneously having another thread
continuously making a backup of the db (unrealistic scenario, just
makes the race easier to see).

It may or may not matter whether or not encryption is used, or more
importantly whether SQLITE_HAS_CODEC is defined, since the portion of
code that's asserting is only there when SQLITE_HAS_CODEC is
defined.

At this point, I guess I'll just run without enabling shared cache,
which seems to work just fine (a little better with regards to
backups actually) and just hope this gets fixed in a future release.

Jon



It looks like it's unhappy that the mutex for the source database in
the

On Aug 25, 2012, at 1:33 PM, Jonathan Engle wrote:

No, the deadlock is deeper than that, it's stuck trying to lock
mutexes.  My current theory is that the thread trying to update the
page in the backup destination database is what's causing trouble.

I also forgot to mention, each thread is using a different
connection object and that it's using shared cache mode.

Jon On Aug 25, 2012, at 12:57 PM, Patrik Nilsson wrote:

Do you test for the backup errors, i.e. SQLITE_BUSY and
SQLITE_LOCKED?

Do you test for step errors, i.e.  SQLITE_BUSY?

If you get the busy error, you can wait a while and try again or
start over.

/Patrik

On 08/24/2012 05:46 PM, Jonathan Engle wrote:
Ran into this recently, it's happened on one machine running a
beta test of our software.  This is a multi-threaded
application, and I've run into a sequence of steps that
deadlocks hard that as far as I can tell from the documentation
shouldn't. This is using SQLite 3.7.13 with SEE. The source
database is using WAL mode, all transactions are done as
IMMEDIATE, synchronous mode is set to 0, and it is encrypted.
The destination database for the backup is not encrypted, and
is default (non-WAL, full synchronous) modes.


There are multiple threads active:

- one performing a write - two performing reads - one closing a
connection - one is in the middle of a backup operation

Here are the call stacks for the threads:


Writing thread:

sqlite3_step sqlite3VdbeExec sqlite3VdbeHalt
sqlite3BtreeCommitPhaseOne sqlite3PagerCommitPhaseOne
pagerWalFrames sqlite3BackupUpdate backupOnePage
sqlite3BtreeEnter lockBtreeMutex pthread_mutex_lock
__psynch_mutexwait

Closing a connection thread:

sqlite3_close sqlite3BtreeEnterAll sqlite3BtreeEnter
lockBtreeMutex pthread_mutex_lock __psynch_mutexwait

Reading thread:  sqlite3_step sqlite3VdbeExec sqlite3VdbeEnter
sqlite3BtreeEnter lockBtreeMutex pthread_mutex_lock
__psynch_mutexwait

Backing up thread:  sqlite3_backup_step sqlite3BtreeEnter
lockBtreeMutex pthread_mutex_lock __psynch_mutexwait  Reading
thread:

sqlite3_step sqlite3VdbeExec sqlite3VdbeEnter
sqlite3BtreeEnter lockBtreeMutex pthread_mutex_lock
__psynch_mutexwait



Also, the destination database for the backup is created on the
stack by the the thread doing the backup and is never passed
out to anybody (explicitly).

What looks like is happening to me is that the writing and
backing-up thread are deadlocking with each other, with
'sqlite3BackupUpdate' attempting to update the backup
destination database.  Unfortunately, this is not something
I've reproduced locally, so I can't look parameters or lock
states.  I'm going to try, as a kind of hail-mary, putting a
BEGIN IMMEDIATE transactions around the backup to block writing
during the database backup.

If anyone has any suggestions or ideas about what I might be
doing wrong here, I'd appreciate it.


_______________________________________________ sqlite-users
mailing list sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________ sqlite-users
mailing list sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________ sqlite-users
mailing list sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________ sqlite-users mailing
list sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to