[sqlite] Thread safety guarantees

Igor Tandetnik Fri, 02 Sep 2005 20:14:44 -0700

I'm trying to piece together the thread safety guarantees that SQLiteprovides. They don't appear to be spelled out explicitly. So I'm tryingto infer the rules from the documents describing SQLite internals, aswell as the recent "file locks on linux have thread affinity" story.

Here is my understanding of the situation - I would greatly appreciateit if SQLite experts would confirm or deny it.

For the purposes of this discussion, every connection handle (sqlite*)and every statement handle (sqlite_stmt*) can be in one of two states -"safe" and "unsafe". Various API calls transfer handles between thesestates. While in a safe state, a handle can be passed freely betweenthreads. As soon as a call puts a handle into an unsafe state, allfurther calls must arrive on the same thread until the handle becomessafe again. Various levels of thread safety are determined by exactlywhat calls transition handles between states.

I could think of three thread safety levels, arranged from weakest tostrongest. My question boils down to which level describes reality mostclosely.

1. All handles are always unsafe, from sqlite3_open to sqlite3_close andfrom sqlite3_prepare to sqlite3_finalize. All calls referring to aparticular connection and all statements associated with it must occuron the same thread.

This is obviously a safe assumption, but also least useful in manypractical situations.

2. A connection is safe as long as there is no activity on it - there isno open transaction and no statements. The connection is born safe bysqlite3_open, becomes unsafe as soon as a "begin transaction" isexecuted or sqlite3_prepare is called, and becomes safe again when thetransaction (if any) is committed or rolled back and the last statementis finalized.

   A statement is always unsafe.

This assumption allows creating a connection pool used by a thread pool.A worker thread grabs an idle connection, executes a batch of statementson it, and returns it back to the pool where another thread can now useit.

I'm not sure how useful this optimization is. I have some experiencewith traditional client-server databases, where establishing aconnection is a pretty expensive operation and connection pooling isimportant. How expensive is sqlite3_open? Does the answer change if theclient code has to register a few custom collations, custom functionsand such every time it opens a connection? Is it worth it to maintain aconnection pool, or is it fine to just open a new connection every timeI need one?

3. A statement is safe right after sqlite3_prepare, becomes unsafe onthe first sqlite3_step call, and safe again after sqlite3_reset. Inother words, a statement can be tranferred between threads as long as itdoes not touch actual data.A connection is safe as long as there are no open transactions orunsafe queries. As soon as a transaction opens or one statement is beingstepped through, all activity should happen on the same thread. Once theactivity stops (but there may still be freshly prepared or resetstatements), the connection is safe again and can be transferred to anew thread.

This assumption allows creating a pool of objects that encapsulate anopen connection together with a bunch of prepared statements - poorman's stored procedures if you will. I believe this may prove useful insome situations. E.g. imagine a system that receives a stream of recordsover the network and needs to insert them into the database. Itmaintains a thread pool where worker can execute a job consiting ofinserting a single record. There are only a few different kinds ofinsert statements (one for each table). It would help if a worker threadcan grab a connection and use a pre-compiled statement to execute itsjob. Saves some time preparing the same queries over and over.

So, which assumption is correct? It appears to me that all three arecompatible with "linux thread-unsafe locks" issue, but I'd like toreceive confirmation. And of course, there's a big chance I'm missingsomething obvious.

Also, how does sqlite3_interrupt fit into the picture? It's clearly onlyuseful if one can call it on a thread different from the one that's busyexecuting a statement on the connection. Can it be assumed thatsqlite3_interrupt can be called from any thread at any time?

A somewhat unrelated note: I think it would be useful to introduce afunction that clones an existing connection - that is, opens a newconnection to the same database as the existing one, and registers allthe same custom collations, custom functions, an authorizer, maybe abusy handler, maybe a set of ATTACHed databases and so on. Even nicerwould be an ability to clone a prepared statement so that the copy isassociated with a different connection to the same database.

Igor Tandetnik

[sqlite] Thread safety guarantees

Reply via email to