Hi,
Just for your information, i've written an multi-threaded application
that accesses a sqlite database (well four of them in fact). All threads
(many) use the same handle as retrieved from sqlite open. Each thread
locks the database by entering a critical section, performs a query,
fetches results in case of a select, and leaves the critical section
again. So no two threads simultaniously access the same db. For
performance reasons (both locking and indexin) i split my database in
four databases.
Initially i let each thread start and end a transaction each time.
However, again for performance reasons, i decided to use another method.
A transaction is only started at application startup.For many minutes, i
let the journal file grow, filled by many threads, then once in a while
the transaction is comitted and restarted, so that the journal files
gets merged with the database files. Although this all seem to work
perfectly on windows, i feel obliged to warn against this practice if
your data is actually _important_. However, in my case i was fighting
performance, and didn't really care for some data to be lost in the case
of a computer crash. As long as the database itself does not get
corrupted it is fine with me. If your data is important, be sure to
start/end transactions at the cost of (much) more disk access.
So, concluding: if you have a decent thread locking mechanism, you can
safely use sqlite and do anything you like as if it is one single thread
accessing the db. On windows that is. Development environment: delphi
5/7/2005 + libsql.
Although not an exact answer to your questions i hope this information
is usefull.
regards,
rene
Igor Tandetnik wrote:
I'm trying to piece together the thread safety guarantees that SQLite
provides. They don't appear to be spelled out explicitly. So I'm
trying to infer the rules from the documents describing SQLite
internals, as well as the recent "file locks on linux have thread
affinity" story.
Here is my understanding of the situation - I would greatly appreciate
it if SQLite experts would confirm or deny it.
For the purposes of this discussion, every connection handle (sqlite*)
and every statement handle (sqlite_stmt*) can be in one of two states
- "safe" and "unsafe". Various API calls transfer handles between
these states. While in a safe state, a handle can be passed freely
between threads. As soon as a call puts a handle into an unsafe state,
all further calls must arrive on the same thread until the handle
becomes safe again. Various levels of thread safety are determined by
exactly what calls transition handles between states.
I could think of three thread safety levels, arranged from weakest to
strongest. My question boils down to which level describes reality
most closely.
1. All handles are always unsafe, from sqlite3_open to sqlite3_close
and from sqlite3_prepare to sqlite3_finalize. All calls referring to
a particular connection and all statements associated with it must
occur on the same thread.
This is obviously a safe assumption, but also least useful in many
practical situations.
2. A connection is safe as long as there is no activity on it - there
is no open transaction and no statements. The connection is born safe
by sqlite3_open, becomes unsafe as soon as a "begin transaction" is
executed or sqlite3_prepare is called, and becomes safe again when the
transaction (if any) is committed or rolled back and the last
statement is finalized.
A statement is always unsafe.
This assumption allows creating a connection pool used by a thread
pool. A worker thread grabs an idle connection, executes a batch of
statements on it, and returns it back to the pool where another thread
can now use it.
I'm not sure how useful this optimization is. I have some experience
with traditional client-server databases, where establishing a
connection is a pretty expensive operation and connection pooling is
important. How expensive is sqlite3_open? Does the answer change if
the client code has to register a few custom collations, custom
functions and such every time it opens a connection? Is it worth it to
maintain a connection pool, or is it fine to just open a new
connection every time I need one?
3. A statement is safe right after sqlite3_prepare, becomes unsafe on
the first sqlite3_step call, and safe again after sqlite3_reset. In
other words, a statement can be tranferred between threads as long as
it does not touch actual data.
A connection is safe as long as there are no open transactions or
unsafe queries. As soon as a transaction opens or one statement is
being stepped through, all activity should happen on the same thread.
Once the activity stops (but there may still be freshly prepared or
reset statements), the connection is safe again and can be transferred
to a new thread.
This assumption allows creating a pool of objects that encapsulate an
open connection together with a bunch of prepared statements - poor
man's stored procedures if you will. I believe this may prove useful
in some situations. E.g. imagine a system that receives a stream of
records over the network and needs to insert them into the database.
It maintains a thread pool where worker can execute a job consiting of
inserting a single record. There are only a few different kinds of
insert statements (one for each table). It would help if a worker
thread can grab a connection and use a pre-compiled statement to
execute its job. Saves some time preparing the same queries over and
over.
So, which assumption is correct? It appears to me that all three are
compatible with "linux thread-unsafe locks" issue, but I'd like to
receive confirmation. And of course, there's a big chance I'm missing
something obvious.
Also, how does sqlite3_interrupt fit into the picture? It's clearly
only useful if one can call it on a thread different from the one
that's busy executing a statement on the connection. Can it be assumed
that sqlite3_interrupt can be called from any thread at any time?
A somewhat unrelated note: I think it would be useful to introduce a
function that clones an existing connection - that is, opens a new
connection to the same database as the existing one, and registers all
the same custom collations, custom functions, an authorizer, maybe a
busy handler, maybe a set of ATTACHed databases and so on. Even nicer
would be an ability to clone a prepared statement so that the copy is
associated with a different connection to the same database.
Igor Tandetnik