[sqlite] Design for concurrency

Marian Aldenhoevel Tue, 08 Sep 2009 00:46:14 -0700

Hi,

I am currently designing a system where I am planning to use SQLite3 and 
would like some advice on basic architecture. The system is for a 
point-of-sale-like semi-embedded application.


The program will be running on quite limited hardware with Compact Flash 
for storage. The database consists of currently about 350k records, 
on-disk that amounts to a 48MB SQLite database file. That number is 
likely to grow over the lifetime of the application, maybe by a factor 
of two or so.

The main operation will be single-record queries. The records being 
identified via a primary key or one of two other indexed columns. These 
reads are initiated via a user-interaction and it is not predictable 
when they occur. There may be a few per day, or one every minute.

These reads need to be quick, that is the overriding design criterion. 
Say they may take two or three seconds at most, an arbitrary upper bound 
for the sake of discussion. At first glance that poses no problem at 
all, those reads are very fast.

But there will be updates to the database as well. These are cyclic at 
preplanned times, several times a day and may want to update anything 
from a few hundred to a few thousand records. The data is fetched from a 
website as CSV and parsed and transformed into INSERT OR UPDATE statements.

Now the problem becomes one of concurrency: How can I ensure an upper 
bound on the time it takes to do the single-record reads in this scenario?

Technically my programm will be a single multithreaded executable 
written in C++. I can assume that this program will be the only program 
using the database, so synchronisation mechanism outside of SQLite are 
acceptable, but it would be nice to do without.

A) The naive approach.

For the update start a transaction, do all the INSERT OR UPDATES in one 
batch and then commit it.

I have implemented that in a predecessor-version that would not allow 
any other approach and is not concurrent. Users are locked out during 
the update which is unacceptable for the new design.

I have timed the COMMIT to take anything between 30 seconds and 3 
minutes, depending on the number of updated records.

As I understand SQLite locking no reads can be serviced in the time it 
actually takes to COMMIT the transaction. Right?

So that won't work.

B) The a little less naive approach.

Instead of batching all the updates into one transaction only batch 
some. Tune the size of the batches so that their individual COMMIT does 
not take too long so that the time-constraint for the reads can be met. 
ACIDity is not an issue here, partial updates (some records updated, 
others not) may be applied without any ill effect, the remaining records 
would then be updated as part of the next cycle.

I would have to make sure that reads get a chance even if the writing 
process starts fresh transaction in a tight loop. I think that can be 
done with the SQLite concurrency system if I understand it correctly.

The total time for the update would be greatly increased of course due 
to it being split into many transactions. By how long would remain to be 
tested, I have no idea. But that is not a big problem in itself as long 
as the reads still are being serviced.

C) A silly idea.

I could also split the database file into two copies.

One "active" copy that is used to service the reads. And one "in 
transit" copy that is being updated.

So at the start of an update cycle I would make a copy of the active 
database file. Either on the filesystem (would that be safe? A hot copy 
of a SQLite database file that is only read from?) or using SQLite itself.

Then I would update the copy in one transaction. Commit it. And after 
the commit somehow flag the copy as active so that the next read will be 
from that copy.

This approach would decouple reads and writes at the price of added 
complexity for the switch, making sure it's all well-synchronized etc.. 
Homemade concurrency.

Making a copy of the database on the CF card currently takes around two 
minutes. So that would have to be added on top of the update time and 
the single-batch commit time. But it's a constant (well, for a given 
database size only of course) and does not interfere with the reads. So 
no problem here.



Those are the strategies I was able to think of so far. Comments are 
welcome. Better ideas as well. Please point out my dumb errors in any case.

Ciao, MM

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Design for concurrency

Reply via email to