> well, it could be true, but not in the queries i have posted. i "group
> by" column "a" and there is an index on column "a", so sqlite does not
> have to do anything to compute key. it does not even have to back to

   Do not confuse the index key with the aggregator key.  The two may be
the same in your case, but in general they won't be, so I wouldn't
expect SQLite to reuse the index key as the aggregator key even if it
can.

> the disk and fetch the value of column "a" because its right in the
> index. the only work sqlite has to do is to go back to the disk and
> fetch the value of column "n1" and sum it.

   You're underestimating how much work has to be done to just "fetch
the value of column n1".  The entire row containing n1 has to be read,
then the value of column n1 extracted from it.  That means disk seeks,
disk reads, moving around within the data row, etc.

> what you say is correct, but four (!) times performance increase?
> please, you have got to admit that something is not right here.

   It may well be possible to make your query run faster.  The biggest
problem here is simply that you're expecting a magically
high-performance solution without understanding what needs to be done in
order to satisfy your request.

> why? i'm using relational database to do exactly what it should do
> fast. that is i want select the range of rows from a much bigger set

   Do not confuse what you want it to do fast, or even what you think it
should do fast, with what it actually does or how fast it does it.

> and do something with it. i'm not asking it to do complex computations
> or anything else.

   You don't think you're doing anything complex, but you really are,
you just don't understand that you are.  Locating a variable-sized row
in an unpredictable location and performing arbitrary operations on some
sub-section of that row is a heck of a lot more complicated than
scanning through a flat file and computing hash keys.

> i think, i know what is going on here. the problem is that every time
> i do an indexed scan sqlite has to
> 
> 1) fetch index pages
> 
> 2) fetch data pages that match "where" condition
> 
> because both index and data are in the same file sqlite has to perform
> insane amount of seek() calls and this is clearly demonstrated in my
> previous post.

   Once again, the fact that the data and the index are in the same file
is irrelevant.  Even if the data and the index were in separate files,
the disk heads would be doing a lot of random I/O to move from the
location of one row to the next given that the table rows are not
accessed sequentially when using an index.

> it still takes 14 whooping seconds to "group by"  300,000+ records
> down to 1417 records and that is just unacceptable (imo) for
> relational database.

   Why does the fact that SQLite is a relational database mean that
response times should be measured in seconds?  Have you tried this with
any other relational database, as another poster suggested?  You may
well get better performance out of something like MySQL or Oracle, but
it'll come at a price - much more memory usage and an SMP system that
allows you to effectively multithread.  I can guarantee you that any
other relational database, if configured to use the same amount of
memory as SQLite (a few MBs) and the same number of threads (1) that
they would all perform markedly *worse* than SQLite.

> from what i can see sqlite took extra 7 seconds to fetch the data of
> the disk. that is it had to seek back to data page, fetch it, process
> it, seek back to next index page etc.

   That's correct.  That's the price you pay for using an index to look
up the data.

> yes, i tried all that. it is not that easy to read sqlite explain
> output (imo) but i convince myself that it uses all the proper indexes
> when i make the query.

   Just because you think it's using the correct index doesn't mean that
that's the optimal access path.  Assuming that it is, however... Given
that you do seem to understand the overhead incurred by using an index
to look up data that is located elsewhere, what's the issue?  It seems
pretty clear that SQLite isn't doing anything wrong.  It isn't doing it
as optimally as your C program, but we've already covered the reasons
for that, so what's left to discuss?

> i think sqlite can only do good job when resulting set is very small.
> ideally one row. in this case it does not have to go back to disk to
> fetch the columns that are not in index. it is that simple (imo). the
> more columns (that are not in the index) you request the more time it
> takes.

   Why would you expect the processing of a large number of rows to take
the same amount of time as processing a single row?  Once again, your
expectations are unreasonable.  It's going to take more time to process
a large number of rows simply because there's more data to be managed.
   The number of columns requested isn't much of a factor either.
Requesting 10 columns that are not in the index is only slightly more
expensive than requesting 1 - the cost is in having to go back to the
table to retrieve the entire table row, not in extracting the columns
within it.  Not having to go back to the table at all is ideal, of
course, but generally not realistic, as most often you can't have an
index containing every piece of data every individual query needs
without using so much space and impacting insert speed so dramatically
that the whole database becomes unusable.  This is not a problem that's
isolated to SQLite though - all relational databases (that I'm aware of
anyway) work this way and have these problems.

> i'm basically getting the same 6,7 seconds. however in order to create
> that small subset you have to do sa 14+ seconds query :(

   The number of rows in the resultset isn't what dictates performance,
it's the number of rows that must be inspected in order to create that
resultset.

   -Tom

Reply via email to