How feasible would it be to add support for higher concurrency to SQLite, especially via MVCC? (If it turns out to be feasible and desirable, I'd be willing to work on it in my free time.)
What I would really like in SQLite would be: - Good concurrency, preferably similar or better to that of nsv/tsv (see below). - MVCC (multi-version concurrency control) model preferred, as in PostgreSQL and Oracle. How feasible is that? What designs have been or should be considered for it? How much work does this seem to be? Does it become any easier if it's considered for in-memory databases only, rather than both in-memory and on-disk? After some searching around I found these previous proposals or discussions of increasing concurrency in SQLite: http://www.sqlite.org/concurrency.html http://www.sqlite.org/cvstrac/wiki?p=BlueSky http://www.sqlite.org/cvstrac/attach_get/100/shadow.pdf http://citeseer.ist.psu.edu/131901.html http://www.mail-archive.com/cgi-bin/htsearch?config=sqlite-users_sqlite_org&restrict=&exclude=&words=concurrency Many of the suggestions in Dr. Hipp's 2003-11-22 concurrency draft sound very useful, especially blocking (rather than polling) locks. (In fact, I was surprised to learn that SQLite does NOT block on locks now.) But, that draft never mentions MVCC at all. Why not? Since MVCC both gives better concurrency and is friendlier to use than pessimistic locking models, I was surprised not to at least see it mentioned. Is an MVCC implementation thought to be too complicated? Or? Doug Currie's <[EMAIL PROTECTED]> "Shadow Paging" design sounds promising. Unfortunately, I have not been able to download the referenced papers at all (where can I get them?), but as far as I can tell, it seems to be describing a system with the usual Oracle/PostgreSQL MVCC semantics, EXCEPT of course that Currie proposes that each Write transaction must take a lock on the database as a whole. But, other than the locking granularity, in what way is Currie's Shadow Paging design the same or different from PostgreSQL's MVCC implementation, both in terms of user-visible transaction semantics, and the underlying implementation? I believe PostgreSQL basically marks each row with a transaction id, and keeps track of whether each transaction id is in progress, committed, or aborted. Here are a few links about that: http://developer.postgresql.org/pdf/transactions.pdf http://openacs.org/forums/message-view?message_id=176198 Since Currie's design has only one db-wide write lock, it is semantically equivalent to PostgreSQL's "serializable" isolation level, correct? How could this be extended to support table locking and PostgreSQL's default "read committed" isolation level? Would the smallest locking granularity possible in Currie's design be one page of rows, however many rows that happens to be? The one process, many threads aspect of Currie's design sounds just fine to me. The one write lock for the whole database, on the other hand, could be quite limiting. How much more difficult would it be to add table locks to the design? It would also be a nice bonus if the design at least contemplates how to add row locks (or locks for pages of rows) in the future, but my guess is that table locks would be good enough in practice. Currie's design also seems to defer writing any data to disk until the transaction commits, which seems odd to me. I did not follow many of the details of that design so I'm probably missing something here, but since most write transactions commit rather than abort, in any sort of MVCC model wouldn't it be better to write data to disk earlier rather than later? I'm pretty sure that's what both Oracle and PostgreSQL do. My particular interest in SQLite: ------------------------------------ Now that I've asked lots of questions above, I'll describe some of the real-world use cases that got me thinking about this, in case it helps clarify how and why I'm interested in a high-concurrency SQLite: A while back I wrote a high-performance multi-threaded application that basically just accepted data requests, used various ugly low level proprietary C APIs to fetch data from remote servers, and then organized the data and fed it back to the client application as simple rows of CSV-style text. Using all those low-level C APIs meant tracking lots of somewhat complicated housekeeping data. If my application ever crashed all housekeeping data instantly became worthless anyway, so I definitely didn't want to store it persistently in Oracle or PostgreSQL; for both simplicity and performance, I wanted it in-memory only. So in my case, I used AOLserver's nsv's for this. (The Tcl Thread Extension has "tsv" which is just like "nsv", except better.) Nsvs are basically just associative arrays for storing key/value pairs (like Tcl arrays or Perl hashes), but nsv operations are both atomic and highly concurrent, as they were designed for inter-thread communication in AOLserver. (Mutex locking is automatic. Nsvs are assigned to buckets with one mutex lock per bucket, and the number of buckets is tunable.) Using nsvs worked for me, but being limited to only key/value pairs was UGLY. Much of the housekeeping data had 1-to-many or even many-to-many mappings (e.g., each "query" might have many "request_ids"), and being limited to key/value pairs was both painful and the source of some bugs. And I believe that the pain and bugs could have been a lot worse if my program and its housekeeping data had more complicated. What I REALLY wanted was an in-memory relational database, but I didn't have one. When I need on-disk persistence I'd normally use Oracle or PostgreSQL, but certainly it'd be convenient if I could use the same lightweight both in-memory and on-disk. So there's a hole in my toolkit, and I'm looking for software to fill it. For that, SQLite seems to have the following important advantages: - Relational, transactional, ACID. - Simple and very high quality code (so I am told; I have not yet read any sources). - Option of running either in-memory or on-disk. - Pretty good support for SQL features, joins, etc. (No correlated subqueries, but I can live with that.) - Thread safe. - Easily embeddable anywhere C is. For my needs SQLite seems to have only one problem: Both readers and writers must lock the whole database. Unfortunately, that's a big one. Maybe, in that one particular application I would have been able to get away with using a single thread for all SQLite access, and having all other threads talk to that one db thread. I have not measured SQLite performance and concurrency, nor compared it to nsv/tsv, so I don't really know, and I would certainly want to make those tests before hacking on any SQLite code. But it seems clear that whatever the exact numbers are, SQLite MUST have much lower concurrency than either Tcl's simple and lightweight nsv/tsv key value pairs or heavier weight databases like Oracle or PostgreSQL - and that limits the number of places I'd be able to use SQLite. Solutions other than SQLite: ------------------------------------ One possible alternative to SQLite is Erlang's Mnesia RDBMS: http://www.erlang.se/doc/doc-5.3/doc/toc_dat.html It also works both in-memory and on-disk, apparently already has high concurrency, and can be distributed. I have not (yet) investigated it, and I could not tell from its docs what locking model it uses, how powerful its query language is, etc. But most importantly, it definitely requires a running Erlang interpreter to work at all, and it seems very unclear whether or how well any sort of access from non-Erlang environments would work at all. On this list, Mark D. Anderson <[EMAIL PROTECTED]> recently recommended looking at the "HUT HiBASE/Shades" papers: http://hibase.cs.hut.fi/publications.shtml The fact that Nokia funded and then canceled the HiBase project is interesting, as Ericsson developed Erlang, which sounds similar in some ways. Unfortunately, it sounds as if the HiBASE project is dead and the code was never finished. It also appears to be strictly a main-memory database, so I'm not sure how applicable any of that work would be so a database like SQLite (or Mnesia for that matter) which may be used either in-memory or on-disk. Anyone here aware of any other good alternatives? -- Andrew Piskorski <[EMAIL PROTECTED]> http://www.piskorski.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]