Am 28.06.2013 14:54, schrieb Christopher W. Steenwyk:

I have a rather large database (11 GB) that has two tables (one with
approximately 500,000 rows and another with approximately 50,000,000 rows).
  In this database I am performing a query that joins these two tables to
produce approximately 4.4 billion rows and then uses a GROUP BY and COUNT
to return some values to me.

I struggled with the slowness of this query for a while in linux and
through covering indexes, ANALYZE, and a SSD I was able to get the query to
run in about 15 minutes which I am OK with.

... speaking from a pure technical (or better, hardware-)
perspective - the problem should not take that long.

It is IO-Bound apparently (especially on Windows) - but even on
Linux your 15 minutes (which is roughly 1000 seconds for ease of
calculation) seem way too much.
The sustained Disk-Read-Throughput of a modern SSD is between 300
and 550MB per second. Let's take 300MB/sec - with that rate, your
11GB should be "spooled-through" (passing your own "Software-Filter"
along the way) in roughly 1GB any 3 seconds - or say: about 30-40
seconds for the whole 11GB.

Most (or at least "many") of the problems, which cause such huge
amounts of data, are pretty "narrow" - and if that ist the case,
then one doesn't always need to tackle them with a "generic, all-
purpose SQL-engine".

If you end up, having only 2 or 3 "typical Query- or Filter-criteria"
to perform against those 11GB - then why not "hand-code" your filters,
and store your data in dedicated binary files?

E.g. your smaller set of 500,000 Rows looks like (at least for the
Join-Operation) it could be stored separately from the other data -
and before you perform the Grouping-Join, could be loaded into memory
completely.
Done so, you could perform your grouping-filter then, shoveling only
your larger set of data (in binary-format, directly from Disk with
about 300MB/sec) into memory - and then compare it there against
your already cached, smaller "Join-and-Group-criteria-set".


Olaf


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to