Re: [sqlite] sqlite performance problem

John LeSueur Tue, 12 Apr 2005 16:51:20 -0700

Maksim Yevmenkin wrote:

i think, i know what is going on here. the problem is that every time
i do an indexed scan sqlite has to

1) fetch index pages

2) fetch data pages that match "where" condition

because both index and data are in the same file sqlite has to perform
insane amount of seek() calls and this is clearly demonstrated in my
previous post.

even if i remote the sum() but just add extra column that is not in
the index, like so

SELECT count(*) FROM (SELECT a,n1 FROM data WHERE a <= 18234721 GROUP BY a);

time sqlite3 db < test.sql

300000
1024
1417
13.14u 1.06s 0:14.40 98.6%

it still takes 14 whooping seconds to "group by" 300,000+ records down to 1417 records and that is just unacceptable (imo) for relational database.

If seeking was the problem, you'd see more system time. The i/o time is at most 1.06 seconds. The file is probably being cached in the os, and therefore, the seeks just become memory operations.

You might also get more help if you presented your requests in a milder tone. You might even get responses from the people who wrote sqlite, and therefore have knowledge of its internals.

but one thing to try:

CREATE TABLE data(
 a INTEGER,
 b INTEGER,
 c CHAR,
 d INTEGER,
 e INTEGER,
 n1 FLOAT,
 n2 FLOAT
);
CREATE INDEX data_by_a on data (a, b, c, d, e, n1, n2);

This is a big waste of space, and will slow down all your updates and inserts, but since you seem to be mostly worried about the speed of selects, those shouldn't make a difference to you.

You may already know this, but when you create an index with multiple keys, it can use a subset of those keys to look up rows. The limitation is that any key that appears in the index before a column you would like to limit must also be limited, e.g.,

select * from a where a = 12 and b = 16 and c = 16

will use the index, but

select * from a where b = 22 and c = 23


will not use the index.

You claim that the reason sqlite is slower than you expect is because sqlite is using the index to look up the data. If so, this should fix the problem, by making the index have all the data right there.

I'm not so sure that this will resolve your problems, but it may take you closer to a solution. Another thing to try is to try the same experiment with postgresql or mysql, and see if they have the same sort of performance you're expecting to see. If not, you may want to consider that Thomas was right, and a general solution may not be able to perform the calculations with the same speed that you are hoping for.

John LeSueur

Re: [sqlite] sqlite performance problem

Reply via email to