> well, it could be true, but not in the queries i have posted. i "group > by" column "a" and there is an index on column "a", so sqlite does not > have to do anything to compute key. it does not even have to back to
Do not confuse the index key with the aggregator key. The two may be the same in your case, but in general they won't be, so I wouldn't expect SQLite to reuse the index key as the aggregator key even if it can. > the disk and fetch the value of column "a" because its right in the > index. the only work sqlite has to do is to go back to the disk and > fetch the value of column "n1" and sum it. You're underestimating how much work has to be done to just "fetch the value of column n1". The entire row containing n1 has to be read, then the value of column n1 extracted from it. That means disk seeks, disk reads, moving around within the data row, etc. > what you say is correct, but four (!) times performance increase? > please, you have got to admit that something is not right here. It may well be possible to make your query run faster. The biggest problem here is simply that you're expecting a magically high-performance solution without understanding what needs to be done in order to satisfy your request. > why? i'm using relational database to do exactly what it should do > fast. that is i want select the range of rows from a much bigger set Do not confuse what you want it to do fast, or even what you think it should do fast, with what it actually does or how fast it does it. > and do something with it. i'm not asking it to do complex computations > or anything else. You don't think you're doing anything complex, but you really are, you just don't understand that you are. Locating a variable-sized row in an unpredictable location and performing arbitrary operations on some sub-section of that row is a heck of a lot more complicated than scanning through a flat file and computing hash keys. > i think, i know what is going on here. the problem is that every time > i do an indexed scan sqlite has to > > 1) fetch index pages > > 2) fetch data pages that match "where" condition > > because both index and data are in the same file sqlite has to perform > insane amount of seek() calls and this is clearly demonstrated in my > previous post. Once again, the fact that the data and the index are in the same file is irrelevant. Even if the data and the index were in separate files, the disk heads would be doing a lot of random I/O to move from the location of one row to the next given that the table rows are not accessed sequentially when using an index. > it still takes 14 whooping seconds to "group by" 300,000+ records > down to 1417 records and that is just unacceptable (imo) for > relational database. Why does the fact that SQLite is a relational database mean that response times should be measured in seconds? Have you tried this with any other relational database, as another poster suggested? You may well get better performance out of something like MySQL or Oracle, but it'll come at a price - much more memory usage and an SMP system that allows you to effectively multithread. I can guarantee you that any other relational database, if configured to use the same amount of memory as SQLite (a few MBs) and the same number of threads (1) that they would all perform markedly *worse* than SQLite. > from what i can see sqlite took extra 7 seconds to fetch the data of > the disk. that is it had to seek back to data page, fetch it, process > it, seek back to next index page etc. That's correct. That's the price you pay for using an index to look up the data. > yes, i tried all that. it is not that easy to read sqlite explain > output (imo) but i convince myself that it uses all the proper indexes > when i make the query. Just because you think it's using the correct index doesn't mean that that's the optimal access path. Assuming that it is, however... Given that you do seem to understand the overhead incurred by using an index to look up data that is located elsewhere, what's the issue? It seems pretty clear that SQLite isn't doing anything wrong. It isn't doing it as optimally as your C program, but we've already covered the reasons for that, so what's left to discuss? > i think sqlite can only do good job when resulting set is very small. > ideally one row. in this case it does not have to go back to disk to > fetch the columns that are not in index. it is that simple (imo). the > more columns (that are not in the index) you request the more time it > takes. Why would you expect the processing of a large number of rows to take the same amount of time as processing a single row? Once again, your expectations are unreasonable. It's going to take more time to process a large number of rows simply because there's more data to be managed. The number of columns requested isn't much of a factor either. Requesting 10 columns that are not in the index is only slightly more expensive than requesting 1 - the cost is in having to go back to the table to retrieve the entire table row, not in extracting the columns within it. Not having to go back to the table at all is ideal, of course, but generally not realistic, as most often you can't have an index containing every piece of data every individual query needs without using so much space and impacting insert speed so dramatically that the whole database becomes unusable. This is not a problem that's isolated to SQLite though - all relational databases (that I'm aware of anyway) work this way and have these problems. > i'm basically getting the same 6,7 seconds. however in order to create > that small subset you have to do sa 14+ seconds query :( The number of rows in the resultset isn't what dictates performance, it's the number of rows that must be inspected in order to create that resultset. -Tom