"Shailesh N. Humbad" <[EMAIL PROTECTED]> wrote:
> I wrote an article suggesting a method for improving concurrency in 
> SQLite.  The main reason I wrote this article is so I can understand 
> SQLite better, not because my proposal is new or usable.  I am not an 
> expert in SQLite or databases, so I would appreciate any comments or 
> criticisms.  Here is the link:
> 
> http://www.somacon.com/p369.php
> 

This is a well-written paper.  The background sections clearly
articulate the limitations of SQLite and provide a very fair
comparison between SQLite and other "embedded" database engines,
particularly Microsoft Jet.  The paper gives a good and accurate
overview of how SQLite works.

I have some issues with the proposed fine-grain locking
enhancement, however.

The paper proposes a scheme for allowing multiple processes to
write to the same database simultaneously as long as each process
modifies a separate table.  The paper argues that each table in
an SQLite database is a logically distinct and separate entity that
does not interact with other tables.  So multiple writers, each
with their own rollback journal, could write to the file at the
same time as long as there is no overlap in what is written.

This is true as far as it goes.  The problem is that tables
are not quite as distinct and separate as they appear at first
glance.  There are attributes that are shared by all tables
in the database file.  The first process to begin modifying
any table would need to lock all of these shared resources
to prevent any overlap.  And since no other table can be 
modified without changes to the same shared resources, you
have therefore essentially locked the entire database.

Resources shared among all tables include:

   *  The freelist of unused pages.  As the size of tables
      increases and decreases, pages are taken from and
      returned to the common freelist.

   *  The size of the database file.  Strange as it may
      seem, the file size of the database file is part of
      the state information of the database.  That information
      is needed, for example, when the freelist is empty and
      it becomes necessary to allocate new pages from the
      end of the file.  Imagine the complications if two
      separate writers attempted to extend the file at the
      same time.

   *  In autovacuum mode, there are index pages scattered
      about through the database file that are shared by
      all tables.

One might argue that the shared resources above only come into
play when the size of a table increases or decreases.  One could,
in theory, implement a scheme where two or more simultaneous
writers could coexists on separate tables as long as no more than
one was adding and removing information and all the others were
merely changing information.  But in practice this does not seem
to buy you very much.  In SQLite, it is difficult to know when
you are adding or removing information versus merely changing
existing information.  SQLite uses variable-length records.
So changing the value of an integer field from 8388607 to 8388608
causes the size of the record to grow by two bytes, for example.
Changing the value back causes the record to shrink again.
If the page on which that record resided was nearly full or
less than half full, then the change might cause the btree balancing
algorithm to try to split the page or merge it with its neighbors.
As another example, consider changing a text field "abc" to
"xyz".  The record size stays the same since the length of the
text is the same.  But if the column is indexed, the index entry
would need to move from one part of the index to another.  This
might cause compaction in the part of the index where the record
was moving from and expansion in the part of the index where the
record is moving to.  And either operation involves allocating
and deallocating pages which in turn requires access to
shared resources.  

We could back off even further and allow multiple database
readers to continue while one process was writing to a separate
table.  This could be done, but only with an substantial increase
in code complexity and size and a performance hit which is
probably noticable.  And the current locking scheme which holds
writes in memory until the last possible moment comes close
to giving you this already, so the gains are minimal.  And,
as the author of the paper notes, you can already accomplish
the same thing by putting the various tables in separate
database files then using ATTACH to bring them all into the
same database connection.

--
D. Richard Hipp <[EMAIL PROTECTED]>

Reply via email to