You are right. Disk IO - even with SSD - is typically the bottleneck. Running multiple threads would only make sense if all the data is in RAM - :memory: DB or a giant RAM drive (remains to be tested of course and pricey).
However - I noticed that when fully copying a master table into a :memory: DB (and on my small machine this means no greater than 4-5GB - so not a very conlusive test) - simple SELECT COUNT WHEREs go through the data at about 250 MB/sec. IO is now not a factor anymore, and I doubt I am hitting bus speed, so I suspect that this is truly a case of the CPU maxing out - reading the pages from RAM and applying the WHERE criteria on the rows. It's very linear. A 100GB table takes 400 seconds or 7.5 minutes. That's where I am hoping a second concurrent thread could cut it down 50%. But I understand this is completely not the core mission of Sqlite. I know nothing about writing DB engines - so I don't know whether adding a 2nd parallel process adds 10K or 10M to the code base. Just hoping that as technology progresses and small embedded devices carry more and morer RAM - it would be OK to slightly expand the footprint of Sqlite and add some more "desktop" features. This would be so incredible. As it is - Sqlite is virtually identical to Microsoft ACCESS without the Microsoft price tag and footprint. Multi-threaded capability would actually surpass it... On Sat, Jul 14, 2012 at 6:51 AM, Simon Slavin <slav...@bigfraud.org> wrote: > > On 14 Jul 2012, at 2:31pm, Udi Karni <uka...@gmail.com> wrote: > > > (4) Get enough RAM to contain all the data and swap > > This may be the best solution to the problem but it's disappointing. You > really shouldn't need to do this. Computers and caching algorithms should > be doing better to help you. > > > In light of that - is it completely out of line to consider "light > > parallelization" for Sqlite? Even just for the simplest SELECTs (SELECT > > COUNT (*) WHERE AGE = NNN) ? Have Sqlite spawn 2 processes - 1 counting > > the 1st half of the blocks of a table - the second counting the 2nd half? > > Limited only to reads where there are no locking issues? Even capped at 2 > > threads? Only for tables greater than xGB so as not to bother with the > tiny > > ones? Would that introduce bloat in the code and detract from Sqlite's > > original mission? > > Parallelization is of great advantage when the problem is processing: > maths, pattern-searching, and processing of lots of data already in memory. > But SQLite is very efficient at doing this, with a very simple very > searchable file format. I think your bottleneck here isn't processing, > it's I/O, just as you put in the 'Subject' header. I bet your limitation > is in your bus width, throughput, or low-level file handling. If you > implement parallelization the way you describe, the result will just be the > two commands constantly fighting over access to your datastore -- back to > bandwidth and throughput again. Just to give you an idea, in normal setups > the normal bottleneck for SQLite speed is the rotational speed of a hard > disk. > > Simon. > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users