Thanks for the detailed trouble analysis!

Short answer:   Probably you want to do "BEGIN IMMEDIATE" rather than
just "BEGIN" to start your transaction.

The busy timeout is not working because you start out your transaction
using a read operation - the first SELECT statement - which gets a
read lock.  Later when you go to COMMIT, this has to elevate to a
write lock.  But SQLite sees that some other process has already
updated the database since you started your read.  In other words,
your attempted writes are based on stale data and will be rejected.
There is no point in waiting for the other process to finish its
transaction because you are not going to be able to write with stale
data anyhow, so it gives you an SQLITE_BUSY immediately.

By using BEGIN IMMEDIATE, your process reserves the right to write
immediately.  If the BEGIN IMMEDIATE is successful, then the COMMIT is
pretty much guaranteed to run without problems (assuming you don't run
out of disk space or have other highly unusual I/O problems).

If you are daring, you can also try building from the begin-concurrent
branch (https://www.sqlite.org/src/timeline?r=begin-concurrent) and
using "BEGIN CONCURRENT" instead of "BEGIN" to start your transaction.
In that case, the COMMIT might still fail with a conflict, but only if
the intervening transaction modified pages of the database file that
your transaction actually read.  So if the database has two tables,
and you are updating one table, and some other process is updating the
other table, then there is a high chance that both transaction will
run to completion.


On 1/19/19, andrew.g...@l3t.com <andrew.g...@l3t.com> wrote:
> I am having significant difficulty getting two process to not slam into each
> other when accessing the same database.  The behavior I want is for one
> process to wait while the database is busy being written to by the other,
> but instead I immediately get SQLITE_BUSY.  This is happening to me when
> using a local filesystem, so NFS is not the issue.  (Though I expect NFS to
> bite me soon enough.)
>
> My trouble began on a Red Hat Enterprise Linux 6.4 system, though said
> computer exhibited other bizarre behaviors in the past (randomly unmounting
> filesystems and changing existing processes' current working directories to
> empty string), so I moved to Slackware64-current to make sure I could
> reproduce the problem in a modern environment.  Sure enough, I can.
>
> sqlite3_busy_timeout(db, 5000) doesn't seem to help, not even when I add
> -DSQLITE_ENABLE_SETLK_TIMEOUT to my SQLite build command line.
> sqlite3_step() doing an INSERT randomly fails to acquire the RESERVED lock.
> I even tried writing my own retry-on-busy code (made several attempts), with
> no success.
>
> I'm using SQLite 3.26.0 [bf8c1b2b7a] with GCC 8.2.0 on Linux 4.19.6, though
> I also had the same problems with GCC 4.4.7 on Linux
> 2.6.32-431.3.1.el6.x86_64.
>
> Tracing through the SQLite amalgamation code with gdb, one thing I find odd
> is line 66334 doesn't seem to actually ever try calling
> btreeInvokeBusyHandler(pBt) despite rc being SQLITE_BUSY, since
> pbt->inTransaction isn't TRANS_NONE but rather TRANS_WRITE (I think).  This
> prevents it from waiting and/or retrying if busy, so the loop immediately
> bails, then SQLITE_BUSY rapidly percolates up to my application.
>
> However, there's a chance my analysis may be invalid because I made my
> application pause with SIGSTOP on SQLITE_BUSY, at which point I attached gdb
> and forced it to call sqlite3_step() again so I could watch the execution
> path.  So maybe pbt->inTransaction was set sometime after unixFileLock()
> decided things went wrong.  But I highly doubt it.
>
> Maybe a better debug strategy would have been to set a breakpoint in one
> process right when it acquires RESERVED, then start a second process in gdb
> and see what happens when it can't immediately get its own RESERVED lock.
> (Or can't get SHARED because the first process had PENDING, etc.)  But it's
> midnight on Friday and I need to go home.
>
> By the way, the SQLITE_LOCK_TRACE code seems to have a little bitrot.  I had
> to move the #define sqlite3DebugPrintf section earlier in the amalgamation,
> plus change the first %d on line 33121 to %ld because my pthread_t is a
> 64-bit value.
>
> I created a test program that reproduces the issue, pasted at the bottom of
> this email.  (Are attachments allowed on this mailing list?  I sure hope my
> mailer doesn't eat my newlines.)  Sometimes it fails every time I try,
> sometimes I have to try it a dozen times.  To make it fail more frequently,
> uncomment more fork() calls.  (My original application on the Red Hat system
> failed every single time I ran two copies at once.)
>
> No clue if this is of any value, but here's part of the output from
> SQLITE_LOCK_TRACE:
>
> OPEN 11480672 test.db
> fcntl 140340145452864 3 SETLK RDLCK 1073741824 1 4222915 0
> fcntl 140340145452864 3 SETLK RDLCK 1073741826 510 4222915 0
> fcntl 140340145452864 3 SETLK UNLCK 1073741824 1 4222915 0
> FETCH 11480672 page 1 hash(00000000)
> fcntl 140340145452864 3 SETLK UNLCK 0 0 4221434 0
> fcntl 140340145452864 3 SETLK RDLCK 1073741824 1 0 0
> fcntl 140340145452864 3 SETLK RDLCK 1073741826 510 0 0
> fcntl 140340145452864 3 SETLK UNLCK 1073741824 1 0 0
> FETCH 11480672 page 2 hash(00000000)
> fcntl 140340145452864 3 SETLK WRLCK 1073741825 1 4273584 0
> TRANSACTION 11480672
> JOURNAL 11480672 page 2 needSync=1 hash(00000000)
> fcntl 140340145452864 3 SETLK WRLCK 1073741824 1 2 0
> fcntl 140340145452864 3 SETLK WRLCK 1073741826 510 2 -1
> fcntl-failure-reason: RDLCK 1073741826 510 16020
> fcntl 140340145452864 3 SETLK WRLCK 1073741826 510 2 -1
> fcntl-failure-reason: RDLCK 1073741826 510 16020
> (last two lines repeat)
>
> Here's the test SQL being executed:
>
> CREATE TABLE t(a, b, c);              -- Once only
> BEGIN;
> SELECT count(*) FROM t;               -- Used as :rows below
> INSERT INTO t VALUES(:pid, :rows, 0); -- :pid is getpid()
> INSERT INTO t VALUES(:pid, :rows, 1);
> INSERT INTO t VALUES(:pid, :rows, 2);
> COMMIT;
>
> Here's test.c:
>
> #include <signal.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/types.h>
> #include <unistd.h>
> #include "sqlite3.h"
> #define ACTION raise(SIGSTOP)
> /*#define ACTION exit(EXIT_FAILURE) */
> #define die(format, ...) (fprintf(stderr, "(%d) %s:%d: " format "\n", \
>         getpid(), strrchr("/" __FILE__, '/') + 1, __LINE__, ##__VA_ARGS__),
> \
>         ACTION)
> int main(void)
> {
>     sqlite3 *db;
>     sqlite3_stmt *stmt;
>     char *p;
>     int i;
>     int rows;
>     int rc;
>     /* Initialize database if the file does not exist. */
>     if (access("test.db", F_OK) < 0) {
>         if ((rc = sqlite3_open_v2("test.db", &db,
>                 SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE,
>                 NULL)) != SQLITE_OK) {
>             die("sqlite3_open_v2: %s",
>                     db ? sqlite3_errmsg(db) : sqlite3_errstr(rc));
>         } else if (sqlite3_exec(db, "CREATE TABLE t(a, b, c)",
>                 NULL, NULL, &p) != SQLITE_OK) {
>             die("sqlite3_exec: %s", p);
>         } else if ((rc = sqlite3_close(db)) != SQLITE_OK) {
>             die("sqlite3_close: %s", sqlite3_errstr(rc));
>         }
>     }
>     /* Do everything simultaneously. */
>     fork();
> /*  fork();
>     fork();
>     fork(); Uncomment more of these for increased parallelism. */
>     /* Open the database. */
>     if ((rc = sqlite3_open_v2("test.db", &db,
>             SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE,
>             NULL)) != SQLITE_OK) {
>         die("sqlite3_open_v2: %s",
>                 db ? sqlite3_errmsg(db) : sqlite3_errstr(rc));
>     }
>     /* Set busy timeout. */
>     if (sqlite3_busy_timeout(db, 5000) != SQLITE_OK) {
>         die("sqlite3_busy_timeout: %s", sqlite3_errmsg(db));
>     }
>     /* Begin the transaction. */
>     if (sqlite3_exec(db, "BEGIN", NULL, NULL, &p) != SQLITE_OK) {
>         die("sqlite3_exec: %s", p);
>     }
>     /* Get the number of rows already present in the table. */
>     sqlite3_prepare_v2(db, "SELECT count(*) FROM t", -1, &stmt, NULL);
>     if (!stmt) {
>         die("sqlite3_prepare_v2: %s", sqlite3_errmsg(db));
>     } else if (sqlite3_step(stmt) != SQLITE_ROW) {
>         die("sqlite3_step: %s", sqlite3_errmsg(db));
>     }
>     rows = sqlite3_column_int(stmt, 0);
>     if (sqlite3_finalize(stmt) != SQLITE_OK) {
>         die("sqlite3_finalize: %s", sqlite3_errmsg(db));
>     }
>     /* Insert some rows. */
>     sqlite3_prepare_v2(db, "INSERT INTO t VALUES(?, ?, ?)", -1, &stmt,
> NULL);
>     if (!stmt) {
>         die("sqlite3_prepare_v2: %s", sqlite3_errmsg(db));
>     }
>     for (i = 0; i < 3; ++i) {
>         if (sqlite3_bind_int(stmt, 1, (int)getpid()) != SQLITE_OK) {
>             die("sqlite3_bind_int: %s", sqlite3_errmsg(db));
>         } else if (sqlite3_bind_int(stmt, 2, rows) != SQLITE_OK) {
>             die("sqlite3_bind_int: %s", sqlite3_errmsg(db));
>         } else if (sqlite3_bind_int(stmt, 3, i) != SQLITE_OK) {
>             die("sqlite3_bind_int: %s", sqlite3_errmsg(db));
>         } else if (sqlite3_step(stmt) != SQLITE_DONE) {
>             die("sqlite3_step: %s", sqlite3_errmsg(db));
>         } else if (sqlite3_reset(stmt) != SQLITE_OK) {
>             die("sqlite3_reset: %s", sqlite3_errmsg(db));
>         }
>     }
>     if (sqlite3_finalize(stmt) != SQLITE_OK) {
>         die("sqlite3_finalize: %s", sqlite3_errmsg(db));
>     }
>     /* Commit the transaction. */
>     if (sqlite3_exec(db, "COMMIT", NULL, NULL, &p) != SQLITE_OK) {
>         die("sqlite3_exec: %s", p);
>     }
>     /* Close the database. */
>     if ((rc = sqlite3_close(db)) != SQLITE_OK) {
>         die("sqlite3_close: %s", sqlite3_errstr(rc));
>     }
>     /* Congratulations on making it this far. */
>     return EXIT_SUCCESS;
> }
> /* vim: set sts=4 sw=4 tw=80 et ft=c: */
>
> Here's the makefile:
>
> CFLAGS += -ggdb3
> CFLAGS += -Wall
> test: LDFLAGS += -ldl
> test: LDFLAGS += -lpthread
> test: test.o sqlite3.o
> test.o: test.c sqlite3.h
> sqlite3.o: CPPFLAGS += -D_GNU_SOURCE
> sqlite3.o: CPPFLAGS += -DHAVE_INTTYPES_H
> sqlite3.o: CPPFLAGS += -DHAVE_STDINT_H
> sqlite3.o: CPPFLAGS += -DHAVE_FDATASYNC
> sqlite3.o: CPPFLAGS += -DHAVE_USLEEP
> sqlite3.o: CPPFLAGS += -DHAVE_LOCALTIME_R
> sqlite3.o: CPPFLAGS += -DHAVE_GMTIME_R
> sqlite3.o: CPPFLAGS += -DHAVE_STRERROR_R
> sqlite3.o: CPPFLAGS += -DHAVE_POSIX_FALLOCATE
> #sqlite3.o: CPPFLAGS += -DSQLITE_ENABLE_SETLK_TIMEOUT
> sqlite3.o: sqlite3.c
> clean:
>     -$(RM) test test.o sqlite3.o
> # vim: set ts=4 sw=4 tw=80 noet ft=make:
> ------------------------------------------- CONFIDENTIALITY NOTICE: This
> email and any attachments are for the sole use of the intended recipient and
> may contain material that is proprietary, confidential, privileged or
> otherwise legally protected or restricted under applicable government laws.
> Any review, disclosure, distributing or other use without expressed
> permission of the sender is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies without reading,
> printing, or saving..
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to