You may be interested in this article:
http://www.drdobbs.com/parallel/multithreaded-file-io/220300055?pgno=2

Mutli-threaded reading of multiple files (which is basically what you're 
talking about by splitting a file in half) is only faster if you have multiple 
disks (in this article that's a RAID-5 system).

Random I/O gains a bit by threading due to the probability of intersecting 
common disk blocks.


Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems

________________________________________
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Udi Karni [uka...@gmail.com]
Sent: Wednesday, August 01, 2012 2:25 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Windows I/O (was: Initial read speed greater than 
subsequent)

You are right. True Parallel Query can get very complicated. I was hoping
for something very limited for starters - for example -

- only 2 processes
- only for simple full scans where the block range can be divided in two
- only when there is no ORDER/GROUP BY where sub results from the 2 threads
have to be combined

Basically only for queries such as SELECT COUNT/MIN/MAX FROM TABLE WHERE....

Sounds very limited / what's-the-point kind of thing - but it would
actually be very useful when working with large data where you find
yourself doing a lot of QA and study of the data - "how many rows have this
range of codes / are null", etc.

Having 2 processes working simultaneously might cut run times in half - and
save many minutes.

Going higher than 2 might hit disk read limitations anyway - so 2 might be
plenty for version 1.

In other words - nothing grand - just a small optimization that will kick
in on simple stuff. Pick some low hanging fruit.

A "would be nice" if not too complicated.


On Wed, Aug 1, 2012 at 5:57 PM, Christian Smith <
csm...@thewrongchristian.org.uk> wrote:

> On Sat, Jul 14, 2012 at 03:17:07PM +0100, Simon Slavin wrote:
> >
> > On 14 Jul 2012, at 3:12pm, Udi Karni <uka...@gmail.com> wrote:
> >
> > > I know
> > > nothing about writing DB engines - so I don't know whether adding a 2nd
> > > parallel process adds 10K or 10M to the code base.
> >
> > You've reached the limit of what I know about parallelization.  I hope
> someone else can chime in.
>
>
> Using SQLite's VM architecture, I would guess that adding this sort of
> parallelization would be non-trival. You need a parallel VM, significantly
> different to the current sequential VM, at at least a way of managing
> asynchronous IO, with perhaps a callback mechanism into the VM to handle IO
> completion. <shudder>
>
> While not certain, I guess other databases handle this by using tree based
> execution plans, where any single execution node can easily be split into
> branches to another thread/process/machine, then merged in the parent tree
> node, with each branch handling a certain key range.
>
> This might make sense, for example, with a partitioned table, where each
> partition is on it's own spindle, so a full table scan can be executed in
> parallel on each spindle and merged as a final step. So, for a table scan
> between k0 and k3, find intermediate keys to split the query between
> spindles:
>
>         (k0-k3)
>           /|\
>          / | \
>         /  |  \
>        /   |   \
>       /    |    \
> (k0-k1] (k1-k2] (k2-k3)
>    |       |       |
> disk1    disk2    disk3
>
> I sat through an Oracle internals course once, and the instructor gave us
> an example of a setup such as this where data was partitioned across 24
> disks, and the resulting full table scans were in fact quicker than index
> based scans for the data set they were using.
>
> Of course, the above would be useless for SQLite anyway, being a single
> file database. And even with the likes of Oracle, Stripe And Mirror
> Everything (SAME) might also largely defeat parallel scans.
>
> All in all, the added bloat would be measured in MB, rather than KB.
>
> Christian
>
> disclaimer: Not a practical DB implementation expert.
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to