"Shailesh N. Humbad" <[EMAIL PROTECTED]> wrote: > I wrote an article suggesting a method for improving concurrency in > SQLite. The main reason I wrote this article is so I can understand > SQLite better, not because my proposal is new or usable. I am not an > expert in SQLite or databases, so I would appreciate any comments or > criticisms. Here is the link: > > http://www.somacon.com/p369.php >
This is a well-written paper. The background sections clearly articulate the limitations of SQLite and provide a very fair comparison between SQLite and other "embedded" database engines, particularly Microsoft Jet. The paper gives a good and accurate overview of how SQLite works. I have some issues with the proposed fine-grain locking enhancement, however. The paper proposes a scheme for allowing multiple processes to write to the same database simultaneously as long as each process modifies a separate table. The paper argues that each table in an SQLite database is a logically distinct and separate entity that does not interact with other tables. So multiple writers, each with their own rollback journal, could write to the file at the same time as long as there is no overlap in what is written. This is true as far as it goes. The problem is that tables are not quite as distinct and separate as they appear at first glance. There are attributes that are shared by all tables in the database file. The first process to begin modifying any table would need to lock all of these shared resources to prevent any overlap. And since no other table can be modified without changes to the same shared resources, you have therefore essentially locked the entire database. Resources shared among all tables include: * The freelist of unused pages. As the size of tables increases and decreases, pages are taken from and returned to the common freelist. * The size of the database file. Strange as it may seem, the file size of the database file is part of the state information of the database. That information is needed, for example, when the freelist is empty and it becomes necessary to allocate new pages from the end of the file. Imagine the complications if two separate writers attempted to extend the file at the same time. * In autovacuum mode, there are index pages scattered about through the database file that are shared by all tables. One might argue that the shared resources above only come into play when the size of a table increases or decreases. One could, in theory, implement a scheme where two or more simultaneous writers could coexists on separate tables as long as no more than one was adding and removing information and all the others were merely changing information. But in practice this does not seem to buy you very much. In SQLite, it is difficult to know when you are adding or removing information versus merely changing existing information. SQLite uses variable-length records. So changing the value of an integer field from 8388607 to 8388608 causes the size of the record to grow by two bytes, for example. Changing the value back causes the record to shrink again. If the page on which that record resided was nearly full or less than half full, then the change might cause the btree balancing algorithm to try to split the page or merge it with its neighbors. As another example, consider changing a text field "abc" to "xyz". The record size stays the same since the length of the text is the same. But if the column is indexed, the index entry would need to move from one part of the index to another. This might cause compaction in the part of the index where the record was moving from and expansion in the part of the index where the record is moving to. And either operation involves allocating and deallocating pages which in turn requires access to shared resources. We could back off even further and allow multiple database readers to continue while one process was writing to a separate table. This could be done, but only with an substantial increase in code complexity and size and a performance hit which is probably noticable. And the current locking scheme which holds writes in memory until the last possible moment comes close to giving you this already, so the gains are minimal. And, as the author of the paper notes, you can already accomplish the same thing by putting the various tables in separate database files then using ATTACH to bring them all into the same database connection. -- D. Richard Hipp <[EMAIL PROTECTED]>