John Elrick wrote: > Simon Slavin wrote: > >> On 8 Jun 2009, at 8:07pm, Mohit Sindhwani wrote: >> >> >> >>> I'm having a problem that I'm trying to find an elegant solution >>> to. I >>> have a database that stores real-time information - this information >>> is >>> replaced by new values every 5 minutes and has about 30,000 entries. >>> Once a new database is made available, I should start using that one. >>> >>> This database is used as part of an HTTP server that responds to >>> requests based on the data stored in the database. So, I'm running >>> into >>> a design issue trying to solve how to "switch" the database from the >>> old >>> one to the new one. With multiple incoming requests, I'm not sure >>> how/ >>> when to update the database. Also, due to the large number of records >>> (30,000) that are to be put in every 5 minutes, I don't think I should >>> just keep adding to the database since it takes quite some time - it >>> feels better to let a separate process create the database and alert >>> my >>> program that a new file is ready! >>> >>> >> You can do all this in one database without problems. What you need >> is to have a 'dataset' concept: every row in the data table belongs to >> a particular dataset. All you need is an integer. And you make >> another table in the same database that has one entry that just stores >> which dataset is the current one. >> >> When a client needs to consult the table with the data in it, it first >> reads the current dataset number (an integer) from the other table. >> Then it reads just the rows from the data table that have the same >> dataset number. That's all. >> >> The application that updates the database keeps at least two datasets >> in the data table: the current one, and one previous one in case >> applications haven't finished reading it yet. When new data comes in, >> it adds a new dataset, updates the current dataset table, then deletes >> an old dataset. >> >> SQLite is pretty good at recovering space from deleted records. You >> might want to read up on VACUUM, but you shouldn't even need that. >> >> > > I like your idea better than mine. I would suggest some means of > ensuring that some process isn't attempting to use a particular dataset, > although five minutes may be adequate now, some future process may be > introduced which takes longer. It may be a "bad thing" for the data to > suddenly vanish while it was working with it. That was the underlying > motivation for reference counting; although reference counting may not > be the only way of accomplishing this goal. >
Thanks for the suggestion Simon and thanks for the analysis, John. Since I started this thread, I should point out "my" requirements. I like the idea of the dataset. I just don't know if inserting 30,000 inserts in a transaction will cause any concurrency issues. I'm going to prototype this (since SQLite + Ruby is easy to do this in quickly), but in your opinions, which of these would be fast and least hassle in terms of speed/ concurrency: * Insert 30,000 records into the 'dataset' directly. The table into which I insert will have an index on 1 field. I could do it in 1 transaction or multiple transactions of say 5,000 records each * Insert all in 1 shot using a .sql file which has ".separator ," ".import newestfile.csv" * Insert all the records (using #1 or #2) into a temporary table and then an insert into ... select ... to copy it across. In terms of processing, each request is handled within 0.25 - 1.5 seconds (depends on workload) and there may be between 0 to 50 (approx) threads reading data from a dataset at any given time. However, requests can come at any point in the 5 minutes, so I think maintaining at least 1 extra dataset is needed, maybe 2. As each request finishes, it doesn't need the data any more. The data is really transient - meaning it loses its value quite soon after it's been used. Thanks for your help, guys! Cheers, Mohit. 6/9/2009 | 11:55 PM. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users