I'm trying to piece together the thread safety guarantees that SQLite
provides. They don't appear to be spelled out explicitly. So I'm trying
to infer the rules from the documents describing SQLite internals, as
well as the recent "file locks on linux have thread affinity" story.
Here is my understanding of the situation - I would greatly appreciate
it if SQLite experts would confirm or deny it.
For the purposes of this discussion, every connection handle (sqlite*)
and every statement handle (sqlite_stmt*) can be in one of two states -
"safe" and "unsafe". Various API calls transfer handles between these
states. While in a safe state, a handle can be passed freely between
threads. As soon as a call puts a handle into an unsafe state, all
further calls must arrive on the same thread until the handle becomes
safe again. Various levels of thread safety are determined by exactly
what calls transition handles between states.
I could think of three thread safety levels, arranged from weakest to
strongest. My question boils down to which level describes reality most
closely.
1. All handles are always unsafe, from sqlite3_open to sqlite3_close and
from sqlite3_prepare to sqlite3_finalize. All calls referring to a
particular connection and all statements associated with it must occur
on the same thread.
This is obviously a safe assumption, but also least useful in many
practical situations.
2. A connection is safe as long as there is no activity on it - there is
no open transaction and no statements. The connection is born safe by
sqlite3_open, becomes unsafe as soon as a "begin transaction" is
executed or sqlite3_prepare is called, and becomes safe again when the
transaction (if any) is committed or rolled back and the last statement
is finalized.
A statement is always unsafe.
This assumption allows creating a connection pool used by a thread pool.
A worker thread grabs an idle connection, executes a batch of statements
on it, and returns it back to the pool where another thread can now use
it.
I'm not sure how useful this optimization is. I have some experience
with traditional client-server databases, where establishing a
connection is a pretty expensive operation and connection pooling is
important. How expensive is sqlite3_open? Does the answer change if the
client code has to register a few custom collations, custom functions
and such every time it opens a connection? Is it worth it to maintain a
connection pool, or is it fine to just open a new connection every time
I need one?
3. A statement is safe right after sqlite3_prepare, becomes unsafe on
the first sqlite3_step call, and safe again after sqlite3_reset. In
other words, a statement can be tranferred between threads as long as it
does not touch actual data.
A connection is safe as long as there are no open transactions or
unsafe queries. As soon as a transaction opens or one statement is being
stepped through, all activity should happen on the same thread. Once the
activity stops (but there may still be freshly prepared or reset
statements), the connection is safe again and can be transferred to a
new thread.
This assumption allows creating a pool of objects that encapsulate an
open connection together with a bunch of prepared statements - poor
man's stored procedures if you will. I believe this may prove useful in
some situations. E.g. imagine a system that receives a stream of records
over the network and needs to insert them into the database. It
maintains a thread pool where worker can execute a job consiting of
inserting a single record. There are only a few different kinds of
insert statements (one for each table). It would help if a worker thread
can grab a connection and use a pre-compiled statement to execute its
job. Saves some time preparing the same queries over and over.
So, which assumption is correct? It appears to me that all three are
compatible with "linux thread-unsafe locks" issue, but I'd like to
receive confirmation. And of course, there's a big chance I'm missing
something obvious.
Also, how does sqlite3_interrupt fit into the picture? It's clearly only
useful if one can call it on a thread different from the one that's busy
executing a statement on the connection. Can it be assumed that
sqlite3_interrupt can be called from any thread at any time?
A somewhat unrelated note: I think it would be useful to introduce a
function that clones an existing connection - that is, opens a new
connection to the same database as the existing one, and registers all
the same custom collations, custom functions, an authorizer, maybe a
busy handler, maybe a set of ATTACHed databases and so on. Even nicer
would be an ability to clone a prepared statement so that the copy is
associated with a different connection to the same database.
Igor Tandetnik