I'm trying to piece together the thread safety guarantees that SQLite provides. They don't appear to be spelled out explicitly. So I'm trying to infer the rules from the documents describing SQLite internals, as well as the recent "file locks on linux have thread affinity" story.

Here is my understanding of the situation - I would greatly appreciate it if SQLite experts would confirm or deny it.

For the purposes of this discussion, every connection handle (sqlite*) and every statement handle (sqlite_stmt*) can be in one of two states - "safe" and "unsafe". Various API calls transfer handles between these states. While in a safe state, a handle can be passed freely between threads. As soon as a call puts a handle into an unsafe state, all further calls must arrive on the same thread until the handle becomes safe again. Various levels of thread safety are determined by exactly what calls transition handles between states.

I could think of three thread safety levels, arranged from weakest to strongest. My question boils down to which level describes reality most closely.

1. All handles are always unsafe, from sqlite3_open to sqlite3_close and from sqlite3_prepare to sqlite3_finalize. All calls referring to a particular connection and all statements associated with it must occur on the same thread.

This is obviously a safe assumption, but also least useful in many practical situations.


2. A connection is safe as long as there is no activity on it - there is no open transaction and no statements. The connection is born safe by sqlite3_open, becomes unsafe as soon as a "begin transaction" is executed or sqlite3_prepare is called, and becomes safe again when the transaction (if any) is committed or rolled back and the last statement is finalized.
   A statement is always unsafe.

This assumption allows creating a connection pool used by a thread pool. A worker thread grabs an idle connection, executes a batch of statements on it, and returns it back to the pool where another thread can now use it.

I'm not sure how useful this optimization is. I have some experience with traditional client-server databases, where establishing a connection is a pretty expensive operation and connection pooling is important. How expensive is sqlite3_open? Does the answer change if the client code has to register a few custom collations, custom functions and such every time it opens a connection? Is it worth it to maintain a connection pool, or is it fine to just open a new connection every time I need one?


3. A statement is safe right after sqlite3_prepare, becomes unsafe on the first sqlite3_step call, and safe again after sqlite3_reset. In other words, a statement can be tranferred between threads as long as it does not touch actual data. A connection is safe as long as there are no open transactions or unsafe queries. As soon as a transaction opens or one statement is being stepped through, all activity should happen on the same thread. Once the activity stops (but there may still be freshly prepared or reset statements), the connection is safe again and can be transferred to a new thread.

This assumption allows creating a pool of objects that encapsulate an open connection together with a bunch of prepared statements - poor man's stored procedures if you will. I believe this may prove useful in some situations. E.g. imagine a system that receives a stream of records over the network and needs to insert them into the database. It maintains a thread pool where worker can execute a job consiting of inserting a single record. There are only a few different kinds of insert statements (one for each table). It would help if a worker thread can grab a connection and use a pre-compiled statement to execute its job. Saves some time preparing the same queries over and over.


So, which assumption is correct? It appears to me that all three are compatible with "linux thread-unsafe locks" issue, but I'd like to receive confirmation. And of course, there's a big chance I'm missing something obvious.


Also, how does sqlite3_interrupt fit into the picture? It's clearly only useful if one can call it on a thread different from the one that's busy executing a statement on the connection. Can it be assumed that sqlite3_interrupt can be called from any thread at any time?


A somewhat unrelated note: I think it would be useful to introduce a function that clones an existing connection - that is, opens a new connection to the same database as the existing one, and registers all the same custom collations, custom functions, an authorizer, maybe a busy handler, maybe a set of ATTACHed databases and so on. Even nicer would be an ability to clone a prepared statement so that the copy is associated with a different connection to the same database.

Igor Tandetnik

Reply via email to