John.

> >i think, i know what is going on here. the problem is that every time
> >i do an indexed scan sqlite has to
> >
> >1) fetch index pages
> >
> >2) fetch data pages that match "where" condition
> >
> >because both index and data are in the same file sqlite has to perform
> >insane amount of seek() calls and this is clearly demonstrated in my
> >previous post.
> >
> >even if i remote the sum() but just add extra column that is not in
> >the index, like so
> >
> >SELECT count(*) FROM (SELECT a,n1 FROM data WHERE a <= 18234721 GROUP BY a);
> >
> >
> >
> >>time sqlite3 db < test.sql
> >>
> >>
> >300000
> >1024
> >1417
> >13.14u 1.06s 0:14.40 98.6%
> >
> >it still takes 14 whooping seconds to "group by"  300,000+ records
> >down to 1417 records and that is just unacceptable (imo) for
> >relational database.
> >
> >
> If seeking was the problem, you'd see more system time. The i/o time is
> at most 1.06 seconds. The file is probably
> being cached in the os, and therefore, the seeks just become memory
> operations.

i agree that most of the data are cached by filesystem. it would be
even slower it sqlite had to go and fetch all the data from the disk
plates. i'm not sure if system time is counted if the process was put
to sleep due to, say, disk i/o.

> You might also get more help if you presented your requests in a milder
> tone. You might even get responses
> from the people who wrote sqlite, and therefore have knowledge of its
> internals.

 i did not intend to offended anyone, if i did i apologize. 

> but one thing to try:
> 
> CREATE TABLE data(
>   a INTEGER,
>   b INTEGER,
>   c CHAR,
>   d INTEGER,
>   e INTEGER,
>   n1 FLOAT,
>   n2 FLOAT
> );
> CREATE INDEX data_by_a on data (a, b, c, d, e, n1, n2);
> 
> This is a big waste of space, and will slow down all your updates and
> inserts, but since you
> seem to be mostly worried about the speed of selects, those shouldn't
> make a difference to you.

that will not work for me either. i have to fit 60 million rows table
in under 15 gig.

> You may already know this, but when you create an index with multiple
> keys, it can use a subset
> of those keys to look up rows. The limitation is that any key that
> appears in the index before a column
> you would like to limit must also be limited, e.g.,
> 
> select
>     *
> from
>     a
> where
>     a = 12 and b = 16 and c = 16
> 
> will use the index, but
> 
> select
>     *
> from
>     a
> where
>     b = 22 and c = 23
> 
> will not use the index.

yes, i know this. i do not see how this is applicable in my tests.
 
> You claim that the reason sqlite is slower than you expect is because
> sqlite is using the index to
> look up the data. If so, this should fix the problem, by making the
> index have all the data right
> there.

that is not what i said. the index is properly gives me the much (30
times) smaller data set. however, the column i have index on is only
one of the columns i want to fetch. the other columns are still on
disk. in my case it takes longer to get the other columns i want.
again, i'd rather not place index on all the columns because i might
not meet size requirements.

> I'm not so sure that this will resolve your problems, but it may take
> you closer to a solution. Another
> thing to try is to try the same experiment with postgresql or mysql, and
> see if they have the same sort
> of performance you're expecting to see. If not, you may want to consider

i will try another database. its just a already spend too much time
trying to make sqlite work :( i still hope that i'm doing something
wrong here. i just dont know what it is :)

> that Thomas was right, and
> a general solution may not be able to perform the calculations with the
> same speed that you are hoping
> for.

but i'm not doing anything out of the ordinary here. select rows, sum
some columns and do join on a couple of other tables. why would sqlite
not work for me?

thanks,
max

Reply via email to