You didn't show your timing results or say what kind of machine you're running 
on.
I'm also seeing
Calculating Subset Sample...
SQL error (635): near "ORDER": syntax error

I'm running Linux, sqlite 3.7.5, E5520 2.27Ghz Intel with 16 threads possible.

Here's my timings just on the first few rows of your timing output shows 2 
threads is a touch better.
1 thread
( 0) Rows Proc:     5000 Avg: 0.00144 Elapsed:       7.20
( 0) Rows Proc:    10000 Avg: 0.00104 Elapsed:      10.37
( 0) Rows Proc:    15000 Avg: 0.00098 Elapsed:      14.77
( 0) Rows Proc:    20000 Avg: 0.00092 Elapsed:      18.36
( 0) Rows Proc:    25000 Avg: 0.00084 Elapsed:      21.08

2 threads
( 1) Rows Proc:     5000 Avg: 0.00103 Elapsed:       5.20
( 1) Rows Proc:    10000 Avg: 0.00104 Elapsed:      10.45
( 1) Rows Proc:    15000 Avg: 0.00103 Elapsed:      15.53
( 0) Rows Proc:     5000 Avg: 0.00419 Elapsed:      20.96
( 1) Rows Proc:    20000 Avg: 0.00106 Elapsed:      21.23

3 threads
( 1) Rows Proc:     5000 Avg: 0.00536 Elapsed:      26.91
( 2) Rows Proc:     5000 Avg: 0.00598 Elapsed:      30.05
( 1) Rows Proc:    10000 Avg: 0.00535 Elapsed:      53.67
( 2) Rows Proc:    10000 Avg: 0.00612 Elapsed:      61.36
( 1) Rows Proc:    15000 Avg: 0.00537 Elapsed:      80.75

What I would do is one query and split the results of that query between 
threads.

You can use OpenMP to do that.


Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Seth Price [s...@pricepages.org]
Sent: Wednesday, August 03, 2011 9:07 PM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] Threading makes SQLite 3x slower??


Hey all, I have a small DB bound application that I'm working on. I expect that 
it will take 10-20 days to run when I'm done with it. So I'm trying to make it 
multithreaded. But after spending all afternoon getting threading going, it 
runs on the order of 3x slower per query when I go from one to two threads.

Is this expected?

The full DB has around 17 million rows in it, and for each row I am trying to 
count all rows with similar characteristics, divided into different 
classifications. I was already able to improve speed 10x by using the R*tree 
extension to narrow my search. My queries look like this:

SELECT class, COUNT(*) AS count FROM data_r JOIN data USING (rowNum) WHERE 57 < 
col0min AND col0max < 61 AND 52 < col1min AND col1max < 56 AND 66 < col2min AND 
col2max < 70 AND 88 < col3min AND col3max < 92 AND 133 < col4min AND col4max < 
137 AND 57 < col0 AND col0 < 61 AND 52 < col1 AND col1 < 56 AND 66 < col2 AND 
col2 < 70 AND 88 < col3 AND col3 < 92 AND 133 < col4 AND col4 < 137 AND 81 < 
col5 AND col5 < 85 GROUP BY class;

They take around 0.04 seconds each (times 17 million). I thought that I should 
be able to access the DB from each thread without penalty because they are 
read-only queries. I also tried making copies of the DB file to access a 
different file with each thread (also slower). Oddly, there are no frequent 
disk access while I run the program, so there must be caching somewhere.

How can I make threading work? Download the example set and code here:
http://seth.bluezone.usu.edu/sqlite/

There is info on how I compile and run the program in the header. Use 
NUM_THREADS to change the number. The stats on which thread is taking how much 
time is printed under "Pruning Conflicting Examples..." and the threading code 
is under "pruneEx(ExPtr ex)". I'm on a Mac Pro running 10.6.8.

Thoughts? Comments? Ideas?

Thanks,
Seth
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to