You are right. Disk IO - even with SSD - is typically the bottleneck.
Running multiple threads would only make sense if all the data is in RAM -
:memory: DB or a giant RAM drive (remains to be tested of course and
pricey).

However - I noticed that when fully copying a master table into a :memory:
DB (and on my small machine this means no greater than 4-5GB - so not a
very conlusive test) - simple SELECT COUNT WHEREs go through the data at
about 250 MB/sec. IO is now not a factor anymore, and I doubt I am hitting
bus speed, so I suspect that this is truly a case of the CPU maxing out -
reading the pages from RAM and applying the WHERE criteria on the rows.
It's very linear. A 100GB table takes 400 seconds or 7.5 minutes. That's
where I am hoping a second concurrent thread could cut it down 50%.

But I understand this is completely not the core mission of Sqlite. I know
nothing about writing DB engines - so I don't know whether adding a 2nd
parallel process adds 10K or 10M to the code base. Just hoping that as
technology progresses and small embedded devices carry more and morer RAM -
it would be OK to slightly expand the footprint of Sqlite and add some more
"desktop" features.

This would be so incredible. As it is - Sqlite is virtually identical to
Microsoft ACCESS without the Microsoft price tag and footprint.
Multi-threaded capability would actually surpass it...

On Sat, Jul 14, 2012 at 6:51 AM, Simon Slavin <slav...@bigfraud.org> wrote:

>
> On 14 Jul 2012, at 2:31pm, Udi Karni <uka...@gmail.com> wrote:
>
> > (4) Get enough RAM to contain all the data and swap
>
> This may be the best solution to the problem but it's disappointing.  You
> really shouldn't need to do this.  Computers and caching algorithms should
> be doing better to help you.
>
> > In light of that - is it completely out of line to consider "light
> > parallelization" for Sqlite? Even just for the simplest SELECTs (SELECT
> > COUNT (*) WHERE AGE = NNN) ? Have Sqlite spawn 2 processes - 1 counting
> > the 1st half of the blocks of a table - the second counting the 2nd half?
> > Limited only to reads where there are no locking issues? Even capped at 2
> > threads? Only for tables greater than xGB so as not to bother with the
> tiny
> > ones? Would that introduce bloat in the code and detract from Sqlite's
> > original mission?
>
> Parallelization is of great advantage when the problem is processing:
> maths, pattern-searching, and processing of lots of data already in memory.
>  But SQLite is very efficient at doing this, with a very simple very
> searchable file format.  I think your bottleneck here isn't processing,
> it's I/O, just as you put in the 'Subject' header.  I bet your limitation
> is in your bus width, throughput, or low-level file handling.  If you
> implement parallelization the way you describe, the result will just be the
> two commands constantly fighting over access to your datastore -- back to
> bandwidth and throughput again.  Just to give you an idea, in normal setups
> the normal bottleneck for SQLite speed is the rotational speed of a hard
> disk.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to