Part 1. ------- I have the following schema in a SQLite db that is 430 MB on my Macbook laptop's 320 GB HFS+ formatted 7200 RPM disk with an 8 MB cache.
-- 1000,000 rows CREATE TABLE cell ( cell_id INTEGER PRIMARY KEY, met_cell_id INTEGER, 8 other INTEGER or REAL columns ) -- 38 rows CREATE TABLE lc ( lc_id INTEGER PRIMARY KEY, 56 other INTEGER or REAL columns ) -- 10 rows CREATE TABLE dist ( dist_id INTEGER PRIMARY KEY, 37 other INTEGER or REAL columns ) -- 2,920,000 CREATE TABLE met ( met_id INTEGER PRIMARY KEY, met_cell_id INTEGER, 9 other INTEGER or REAL columns ) CREATE TABLE cell_lc (cell_id INTEGER, lc_id INTEGER) CREATE TABLE cell_dist (cell_id INTEGER, dist_id INTEGER) CREATE INDEX idx_met_cell_id ON met (met_cell_id) CREATE INDEX idx_cell_lc ON cell_lc (cell_id) CREATE INDEX idx_cell_dist ON cell_dist (cell_id) I also have an R*Tree index, but that is a different story, not relevant here. I retrieve *all* data for one cell ':cell_id' using the following queries [1] First retrieve all data from cell table SELECT * FROM cell WHERE cell_id = :cell_id [2] Now retrieve the related lc, dist and met SELECT lc.* FROM lc l JOIN cell_lc c on l.lc_id = c.lc_id WHERE c.cell_id = :cell_id [3] Retrieve the related dist SELECT d.* FROM dist d JOIN cell_lc c on d.dist_id = c.dist_id WHERE c.cell_id = :cell_id [4] Retrieve the related met SELECT * FROM met WHERE met_cell_id = <met_cell_id from query [1] above> I did some benchmarking with the above schema using Perl DBI, and I get about 30 transactions per second as long as I returning the data to memory. [08:38 AM] ~/Data/carbonmodel$perl carbonmodel.pl timethis 1: 0 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU) @ 33.33/s (n=1) timethis 10: 0 wallclock secs ( 0.31 usr + 0.02 sys = 0.33 CPU) @ 30.30/s (n=10) timethis 100: 3 wallclock secs ( 2.85 usr + 0.20 sys = 3.05 CPU) @ 32.79/s (n=100) timethis 1000: 33 wallclock secs (31.08 usr + 1.22 sys = 32.30 CPU) @ 30.96/s (n=1000) if I write the data to file, the speed drops to about 1 transaction per second timethis 1000: 783 wallclock secs (732.26 usr + 18.22 sys = 750.48 CPU) @ 1.33/s (n=1000) Even if I stick with manipulating the data in memory, at 30 transactions per second (or 33 ms per transaction), it would take more than 9 hours to query each of the 1 million cells one by one. In the real world, I will first find the relevant cell ids based on lat-lon bounds (hence my R*Tree index) and then extract their data one by one. How can I, if at all, speed this up? Part 2. ------- Alternatively, I could denormalize the data completely. Inspired by a post on the Flickr blog (http://code.flickr.com/blog/2009/03/18/building-fast-client-side-searches/), in particular the para "To make this data available quickly from the server, we maintain and update a per-member cache in our database, where we store each member’s contact list in a text blob — this way it’s a single quick DB query to retrieve it. We can format this blob in any way we want: XML, JSON, etc" I decided to experiment with the same technique. So... CREATE TABLE cell_blobs (cell_id INTEGER PRIMARY KEY, cell_data BLOB); I then queried each cell as in Part 1, serialized it and stored it in the cell_blobs table. My intent is to simply retrieve a BLOB and deserialize it... it would *possibly* be quicker than 33 ms per retrieval. Well, I haven't yet completed this test because each BLOB is taking about 430 KB. At 1 million rows, that is going to occupy upward of 400 GB. I broke the load_blob_table routine after about a third of the records had been processed because I found even the loading_the_blobs to be excruciatingly slow. Suggestions? I am posting the same query on Perlmonks so hopefully I will have a wealth of suggestions. -- Puneet Kishor http://www.punkish.org/ Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/ Carbon Model http://carbonmodel.org/ Open Source Geospatial Foundation http://www.osgeo.org/ Sent from: Madison WI United States. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users