Hi, I am currently designing a system where I am planning to use SQLite3 and would like some advice on basic architecture. The system is for a point-of-sale-like semi-embedded application.
The program will be running on quite limited hardware with Compact Flash for storage. The database consists of currently about 350k records, on-disk that amounts to a 48MB SQLite database file. That number is likely to grow over the lifetime of the application, maybe by a factor of two or so. The main operation will be single-record queries. The records being identified via a primary key or one of two other indexed columns. These reads are initiated via a user-interaction and it is not predictable when they occur. There may be a few per day, or one every minute. These reads need to be quick, that is the overriding design criterion. Say they may take two or three seconds at most, an arbitrary upper bound for the sake of discussion. At first glance that poses no problem at all, those reads are very fast. But there will be updates to the database as well. These are cyclic at preplanned times, several times a day and may want to update anything from a few hundred to a few thousand records. The data is fetched from a website as CSV and parsed and transformed into INSERT OR UPDATE statements. Now the problem becomes one of concurrency: How can I ensure an upper bound on the time it takes to do the single-record reads in this scenario? Technically my programm will be a single multithreaded executable written in C++. I can assume that this program will be the only program using the database, so synchronisation mechanism outside of SQLite are acceptable, but it would be nice to do without. A) The naive approach. For the update start a transaction, do all the INSERT OR UPDATES in one batch and then commit it. I have implemented that in a predecessor-version that would not allow any other approach and is not concurrent. Users are locked out during the update which is unacceptable for the new design. I have timed the COMMIT to take anything between 30 seconds and 3 minutes, depending on the number of updated records. As I understand SQLite locking no reads can be serviced in the time it actually takes to COMMIT the transaction. Right? So that won't work. B) The a little less naive approach. Instead of batching all the updates into one transaction only batch some. Tune the size of the batches so that their individual COMMIT does not take too long so that the time-constraint for the reads can be met. ACIDity is not an issue here, partial updates (some records updated, others not) may be applied without any ill effect, the remaining records would then be updated as part of the next cycle. I would have to make sure that reads get a chance even if the writing process starts fresh transaction in a tight loop. I think that can be done with the SQLite concurrency system if I understand it correctly. The total time for the update would be greatly increased of course due to it being split into many transactions. By how long would remain to be tested, I have no idea. But that is not a big problem in itself as long as the reads still are being serviced. C) A silly idea. I could also split the database file into two copies. One "active" copy that is used to service the reads. And one "in transit" copy that is being updated. So at the start of an update cycle I would make a copy of the active database file. Either on the filesystem (would that be safe? A hot copy of a SQLite database file that is only read from?) or using SQLite itself. Then I would update the copy in one transaction. Commit it. And after the commit somehow flag the copy as active so that the next read will be from that copy. This approach would decouple reads and writes at the price of added complexity for the switch, making sure it's all well-synchronized etc.. Homemade concurrency. Making a copy of the database on the CF card currently takes around two minutes. So that would have to be added on top of the update time and the single-batch commit time. But it's a constant (well, for a given database size only of course) and does not interfere with the reads. So no problem here. Those are the strategies I was able to think of so far. Comments are welcome. Better ideas as well. Please point out my dumb errors in any case. Ciao, MM _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users