Am 28.06.2013 14:54, schrieb Christopher W. Steenwyk:
I have a rather large database (11 GB) that has two tables (one with approximately 500,000 rows and another with approximately 50,000,000 rows). In this database I am performing a query that joins these two tables to produce approximately 4.4 billion rows and then uses a GROUP BY and COUNT to return some values to me. I struggled with the slowness of this query for a while in linux and through covering indexes, ANALYZE, and a SSD I was able to get the query to run in about 15 minutes which I am OK with.
... speaking from a pure technical (or better, hardware-) perspective - the problem should not take that long. It is IO-Bound apparently (especially on Windows) - but even on Linux your 15 minutes (which is roughly 1000 seconds for ease of calculation) seem way too much. The sustained Disk-Read-Throughput of a modern SSD is between 300 and 550MB per second. Let's take 300MB/sec - with that rate, your 11GB should be "spooled-through" (passing your own "Software-Filter" along the way) in roughly 1GB any 3 seconds - or say: about 30-40 seconds for the whole 11GB. Most (or at least "many") of the problems, which cause such huge amounts of data, are pretty "narrow" - and if that ist the case, then one doesn't always need to tackle them with a "generic, all- purpose SQL-engine". If you end up, having only 2 or 3 "typical Query- or Filter-criteria" to perform against those 11GB - then why not "hand-code" your filters, and store your data in dedicated binary files? E.g. your smaller set of 500,000 Rows looks like (at least for the Join-Operation) it could be stored separately from the other data - and before you perform the Grouping-Join, could be loaded into memory completely. Done so, you could perform your grouping-filter then, shoveling only your larger set of data (in binary-format, directly from Disk with about 300MB/sec) into memory - and then compare it there against your already cached, smaller "Join-and-Group-criteria-set". Olaf _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users