Maksim Yevmenkin wrote:

i think, i know what is going on here. the problem is that every time
i do an indexed scan sqlite has to

1) fetch index pages

2) fetch data pages that match "where" condition

because both index and data are in the same file sqlite has to perform
insane amount of seek() calls and this is clearly demonstrated in my
previous post.

even if i remote the sum() but just add extra column that is not in
the index, like so

SELECT count(*) FROM (SELECT a,n1 FROM data WHERE a <= 18234721 GROUP BY a);



time sqlite3 db < test.sql


300000
1024
1417
13.14u 1.06s 0:14.40 98.6%

it still takes 14 whooping seconds to "group by" 300,000+ records
down to 1417 records and that is just unacceptable (imo) for
relational database.


If seeking was the problem, you'd see more system time. The i/o time is at most 1.06 seconds. The file is probably
being cached in the os, and therefore, the seeks just become memory operations.


You might also get more help if you presented your requests in a milder tone. You might even get responses
from the people who wrote sqlite, and therefore have knowledge of its internals.


but one thing to try:

CREATE TABLE data(
 a INTEGER,
 b INTEGER,
 c CHAR,
 d INTEGER,
 e INTEGER,
 n1 FLOAT,
 n2 FLOAT
);
CREATE INDEX data_by_a on data (a, b, c, d, e, n1, n2);


This is a big waste of space, and will slow down all your updates and inserts, but since you
seem to be mostly worried about the speed of selects, those shouldn't make a difference to you.


You may already know this, but when you create an index with multiple keys, it can use a subset
of those keys to look up rows. The limitation is that any key that appears in the index before a column
you would like to limit must also be limited, e.g.,


select * from a where a = 12 and b = 16 and c = 16

will use the index, but

select
*
from a
where b = 22 and c = 23



will not use the index.

You claim that the reason sqlite is slower than you expect is because sqlite is using the index to
look up the data. If so, this should fix the problem, by making the index have all the data right
there.


I'm not so sure that this will resolve your problems, but it may take you closer to a solution. Another
thing to try is to try the same experiment with postgresql or mysql, and see if they have the same sort
of performance you're expecting to see. If not, you may want to consider that Thomas was right, and
a general solution may not be able to perform the calculations with the same speed that you are hoping
for.


John LeSueur



Reply via email to