You may be interested in this article: http://www.drdobbs.com/parallel/multithreaded-file-io/220300055?pgno=2
Mutli-threaded reading of multiple files (which is basically what you're talking about by splitting a file in half) is only faster if you have multiple disks (in this article that's a RAID-5 system). Random I/O gains a bit by threading due to the probability of intersecting common disk blocks. Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems ________________________________________ From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Udi Karni [uka...@gmail.com] Sent: Wednesday, August 01, 2012 2:25 PM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] Windows I/O (was: Initial read speed greater than subsequent) You are right. True Parallel Query can get very complicated. I was hoping for something very limited for starters - for example - - only 2 processes - only for simple full scans where the block range can be divided in two - only when there is no ORDER/GROUP BY where sub results from the 2 threads have to be combined Basically only for queries such as SELECT COUNT/MIN/MAX FROM TABLE WHERE.... Sounds very limited / what's-the-point kind of thing - but it would actually be very useful when working with large data where you find yourself doing a lot of QA and study of the data - "how many rows have this range of codes / are null", etc. Having 2 processes working simultaneously might cut run times in half - and save many minutes. Going higher than 2 might hit disk read limitations anyway - so 2 might be plenty for version 1. In other words - nothing grand - just a small optimization that will kick in on simple stuff. Pick some low hanging fruit. A "would be nice" if not too complicated. On Wed, Aug 1, 2012 at 5:57 PM, Christian Smith < csm...@thewrongchristian.org.uk> wrote: > On Sat, Jul 14, 2012 at 03:17:07PM +0100, Simon Slavin wrote: > > > > On 14 Jul 2012, at 3:12pm, Udi Karni <uka...@gmail.com> wrote: > > > > > I know > > > nothing about writing DB engines - so I don't know whether adding a 2nd > > > parallel process adds 10K or 10M to the code base. > > > > You've reached the limit of what I know about parallelization. I hope > someone else can chime in. > > > Using SQLite's VM architecture, I would guess that adding this sort of > parallelization would be non-trival. You need a parallel VM, significantly > different to the current sequential VM, at at least a way of managing > asynchronous IO, with perhaps a callback mechanism into the VM to handle IO > completion. <shudder> > > While not certain, I guess other databases handle this by using tree based > execution plans, where any single execution node can easily be split into > branches to another thread/process/machine, then merged in the parent tree > node, with each branch handling a certain key range. > > This might make sense, for example, with a partitioned table, where each > partition is on it's own spindle, so a full table scan can be executed in > parallel on each spindle and merged as a final step. So, for a table scan > between k0 and k3, find intermediate keys to split the query between > spindles: > > (k0-k3) > /|\ > / | \ > / | \ > / | \ > / | \ > (k0-k1] (k1-k2] (k2-k3) > | | | > disk1 disk2 disk3 > > I sat through an Oracle internals course once, and the instructor gave us > an example of a setup such as this where data was partitioned across 24 > disks, and the resulting full table scans were in fact quicker than index > based scans for the data set they were using. > > Of course, the above would be useless for SQLite anyway, being a single > file database. And even with the likes of Oracle, Stripe And Mirror > Everything (SAME) might also largely defeat parallel scans. > > All in all, the added bloat would be measured in MB, rather than KB. > > Christian > > disclaimer: Not a practical DB implementation expert. > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users