G'day,
"D. Richard Hipp" <[EMAIL PROTECTED]> 12/06/2004 08:16 AM To: [EMAIL PROTECTED] cc: Subject: [sqlite] Locking and concurrency in SQLite version 3.0 > http://www.sqlite.org/lockingv3.html My thoughts are listed as they come to me. Thought 1: Section 5.0, entitled "Writing to a database file": After the in-memory cache initially spills to disk the exclusive lock must be maintained because the database file is changed. One way to avoid this happening might be to change the database file and log structure as follows: 1) Add a numeric entry to each page in the database file that refers to a specific page (or file offset) into the log file. 2) Add an entry to each log file entry indicating a 0 or 1. If a page is read from the database and has a non-zero file offset, that page refers to the roll-forward log that superceeds it. A zero in the log file entry indicates it is a rollback entry, while a 1 indicates it is a roll-forward entry. The algorithms described would change in the following ways: 1) Instead of writing the dirty page to the main file when memory spilliage occurs, write it to the journal. If the main file entry already has a file offset encoded into it, write the page to that offset. If the main file entry has no offset, write it at the end of the journal file and overwrite only the offset of the main page. 2) Readers with shared locks should always overlook any such offsets it finds in main files. Readers with any of the writer locks should refer to the journal for the updated version of such pages. 3) When rolling back a journal file, only rollback pages with a 0 entry in the rollback/roll-forward field. 4) When committing a transaction write all pages from memory, but also commit any pages in the journal with a 1 in the rollback/roll-forward field. 5) You might have to rethink any vacuum operation and some other small aspects of life. By using the main file as an index into the roll-forward log you make truncating the database file more difficult. One extra alternative to throw in is to keep the roll-back and roll-forward journals in separate files. That would avoid the need to identify the individual log entries as roll-back or roll-forward and may improve performance of large changes. The roll-forward file would never have to be committed. This approach differs slightly from previous suggestions of the shadow pager or of creating tree structures in the journal file. It does not completely virtualise the pager level, although the concept is similar. It requires only trivial extra structure in the journal file since it uses the real main file as an index into the roll-forward section of the journal. If this kind of scheme were to be implimented in the future the groundwork in file format changes could be laid now in a forward-compatable way by allocating the necessary spaces and always ensuring they had a zero value. Thought 2: I'm a little concerned about when SQLITE_BUSY can be returned. In section 7.0, entitled "Transaction Control At The SQL Level" a mention is made of locks not be acquired with the BEGIN statement. Personally I don't like to see SQLITE_BUSY at all. I currently modify my sqlite version to use blocking locks in restricted ways to avoid getting the message and ensure optimum fairness. If they do occur, I would prefer they happen at well-defined and designated places. Hmmm... I guess I can't think of any cases where this is really an issue, though. I would like to see blocking locks supported by SQLITE. If that's not possible it's ok, but my preference is that the capability should exist. Currently sqlite provides an API to execute a function when SQLITE_BUSY would be returned. That's ok, but doesn't suit blocking locks well for two reasons: 1) The locking semantics of sqlite use operating system locks in specific ways that would be unwise to mess with in a callback function. 2) I don't belive there is an API to register a corresponding unlock function to the sqlite_busy_callback, so whatever locks might be put in place can't be unmade at appropriate times. Perhaps the API should be changed to support replacement of the various os.c lock functions for each of the specific lock types in the new sqlite locking model. As a matter of interest, the current sqlite isn't far off being able to work with blocking lock in place of its existing non-blocking locks. The main prohibition that needs to be imposed is that shared locks cannot be upgraded to exclusive locks. The current sqlite can be "tuned" to ensure exclusive locks are obtained early to prevent blocking locks from deadlocking. I haven't seen the new sqlite3 code and haven't seen detail of how the various locking mode transitions will be implimented in a posix environment to know whether extra problems will be introduced in this area. By my reading the allowable transitions will be these: Unlocked -> Shared (non-blocking, may return busy) Shared -> Pending (non-blocking, my return busy) Exclusive -> Shared (will always work immediately) Shared -> Reserved (non-blocking, may return busy) Reserved -> Pending (will always work immediately when preparing to write?) Pending -> Exclusive (non-blocking, may return busy when preparing to rollback) Pending -> Exclusive (blocking, returns succesful eventually when preparing to write) Again, I'm not sure of the exact posix operations involved in each transition but I read this as a lot of shared locks that are transfomed into exclusive locks only twice: Once in a non-blocking manner during rollback, and once in a blocking manner during write operations. More may be introduced in the fiddling with reserved and pending locks. I would guess that Shared -> Reserved requires this at least once, and that Reserved -> Pending does too. Either way, it looks like blocking locks could be used most of the time, at least in principle. Hmmm... I'm rambling a little... but to get to the point: Making blocking locks can be tricky. It kinda looks like it'll be trickier under the new system. It's only natural when you're trying to avoid holding exclusive locks for any extended period of time, but still guaranteeing that you will be able to get one when the time is right. I suppose the balance is right there, and it'll be my code that has to change in the end to get the most of this sqlite version. Anything that preserves the new functionality and reduces the opportunities for getting SQLITE_BUSY would be good, though. Benjamin. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]