[sqlite] Concurrency, MVCC

Andrew Piskorski Tue, 13 Apr 2004 22:16:54 -0700

How feasible would it be to add support for higher concurrency to
SQLite, especially via MVCC?  (If it turns out to be feasible and
desirable, I'd be willing to work on it in my free time.)

What I would really like in SQLite would be:
- Good concurrency, preferably similar or better to that of nsv/tsv
(see below).
- MVCC (multi-version concurrency control) model preferred, as in
PostgreSQL and Oracle.

How feasible is that? What designs have been or should be considered
for it? How much work does this seem to be? Does it become any
easier if it's considered for in-memory databases only, rather than
both in-memory and on-disk?

After some searching around I found these previous proposals or
discussions of increasing concurrency in SQLite:

http://www.sqlite.org/concurrency.html

http://www.sqlite.org/cvstrac/wiki?p=BlueSky
http://www.sqlite.org/cvstrac/attach_get/100/shadow.pdf
http://citeseer.ist.psu.edu/131901.html

http://www.mail-archive.com/cgi-bin/htsearch?config=sqlite-users_sqlite_org&restrict=&exclude=&words=concurrency

Many of the suggestions in Dr. Hipp's 2003-11-22 concurrency draft
sound very useful, especially blocking (rather than polling) locks.
(In fact, I was surprised to learn that SQLite does NOT block on locks
now.) But, that draft never mentions MVCC at all. Why not? Since
MVCC both gives better concurrency and is friendlier to use than
pessimistic locking models, I was surprised not to at least see it
mentioned. Is an MVCC implementation thought to be too complicated?
Or?

Doug Currie's <[EMAIL PROTECTED]> "Shadow Paging" design sounds
promising. Unfortunately, I have not been able to download the
referenced papers at all (where can I get them?), but as far as I can
tell, it seems to be describing a system with the usual
Oracle/PostgreSQL MVCC semantics, EXCEPT of course that Currie
proposes that each Write transaction must take a lock on the database
as a whole.

But, other than the locking granularity, in what way is Currie's
Shadow Paging design the same or different from PostgreSQL's MVCC
implementation, both in terms of user-visible transaction semantics,
and the underlying implementation?

I believe PostgreSQL basically marks each row with a transaction id,
and keeps track of whether each transaction id is in progress,
committed, or aborted. Here are a few links about that:

http://developer.postgresql.org/pdf/transactions.pdf
http://openacs.org/forums/message-view?message_id=176198

Since Currie's design has only one db-wide write lock, it is
semantically equivalent to PostgreSQL's "serializable" isolation
level, correct? How could this be extended to support table locking
and PostgreSQL's default "read committed" isolation level? Would the
smallest locking granularity possible in Currie's design be one page
of rows, however many rows that happens to be?

The one process, many threads aspect of Currie's design sounds just
fine to me. The one write lock for the whole database, on the other
hand, could be quite limiting. How much more difficult would it be to
add table locks to the design? It would also be a nice bonus if the
design at least contemplates how to add row locks (or locks for pages
of rows) in the future, but my guess is that table locks would be good
enough in practice.

Currie's design also seems to defer writing any data to disk until the
transaction commits, which seems odd to me. I did not follow many of
the details of that design so I'm probably missing something here, but
since most write transactions commit rather than abort, in any sort of
MVCC model wouldn't it be better to write data to disk earlier rather
than later? I'm pretty sure that's what both Oracle and PostgreSQL
do.

My particular interest in SQLite:
------------------------------------

Now that I've asked lots of questions above, I'll describe some of the
real-world use cases that got me thinking about this, in case it helps
clarify how and why I'm interested in a high-concurrency SQLite:

A while back I wrote a high-performance multi-threaded application
that basically just accepted data requests, used various ugly low
level proprietary C APIs to fetch data from remote servers, and then
organized the data and fed it back to the client application as simple
rows of CSV-style text.

Using all those low-level C APIs meant tracking lots of somewhat
complicated housekeeping data. If my application ever crashed all
housekeeping data instantly became worthless anyway, so I definitely
didn't want to store it persistently in Oracle or PostgreSQL; for both
simplicity and performance, I wanted it in-memory only.

So in my case, I used AOLserver's nsv's for this. (The Tcl Thread
Extension has "tsv" which is just like "nsv", except better.) Nsvs
are basically just associative arrays for storing key/value pairs
(like Tcl arrays or Perl hashes), but nsv operations are both atomic
and highly concurrent, as they were designed for inter-thread
communication in AOLserver. (Mutex locking is automatic. Nsvs are
assigned to buckets with one mutex lock per bucket, and the number of
buckets is tunable.)

Using nsvs worked for me, but being limited to only key/value pairs
was UGLY. Much of the housekeeping data had 1-to-many or even
many-to-many mappings (e.g., each "query" might have many
"request_ids"), and being limited to key/value pairs was both painful
and the source of some bugs. And I believe that the pain and bugs
could have been a lot worse if my program and its housekeeping data
had more complicated.

What I REALLY wanted was an in-memory relational database, but I
didn't have one. When I need on-disk persistence I'd normally use
Oracle or PostgreSQL, but certainly it'd be convenient if I could use
the same lightweight both in-memory and on-disk.

So there's a hole in my toolkit, and I'm looking for software to fill
it. For that, SQLite seems to have the following important
advantages:

- Relational, transactional, ACID.
- Simple and very high quality code (so I am told; I have not yet read
any sources).
- Option of running either in-memory or on-disk.
- Pretty good support for SQL features, joins, etc. (No correlated
subqueries, but I can live with that.)
- Thread safe.
- Easily embeddable anywhere C is.

For my needs SQLite seems to have only one problem: Both readers and
writers must lock the whole database. Unfortunately, that's a big
one.

Maybe, in that one particular application I would have been able to
get away with using a single thread for all SQLite access, and having
all other threads talk to that one db thread. I have not measured
SQLite performance and concurrency, nor compared it to nsv/tsv, so I
don't really know, and I would certainly want to make those tests
before hacking on any SQLite code.

But it seems clear that whatever the exact numbers are, SQLite MUST
have much lower concurrency than either Tcl's simple and lightweight
nsv/tsv key value pairs or heavier weight databases like Oracle or
PostgreSQL - and that limits the number of places I'd be able to use
SQLite.

Solutions other than SQLite:
------------------------------------

One possible alternative to SQLite is Erlang's Mnesia RDBMS:

http://www.erlang.se/doc/doc-5.3/doc/toc_dat.html

It also works both in-memory and on-disk, apparently already has high
concurrency, and can be distributed. I have not (yet) investigated
it, and I could not tell from its docs what locking model it uses, how
powerful its query language is, etc. But most importantly, it
definitely requires a running Erlang interpreter to work at all, and
it seems very unclear whether or how well any sort of access from
non-Erlang environments would work at all.

On this list, Mark D. Anderson <[EMAIL PROTECTED]> recently
recommended looking at the "HUT HiBASE/Shades" papers:

http://hibase.cs.hut.fi/publications.shtml

The fact that Nokia funded and then canceled the HiBase project is
interesting, as Ericsson developed Erlang, which sounds similar in
some ways.

Unfortunately, it sounds as if the HiBASE project is dead and the code
was never finished. It also appears to be strictly a main-memory
database, so I'm not sure how applicable any of that work would be so
a database like SQLite (or Mnesia for that matter) which may be used
either in-memory or on-disk.

Anyone here aware of any other good alternatives?

--
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[sqlite] Concurrency, MVCC

Reply via email to