> with sum(n1) added query runs twice as slow. as i was told its because > sqlite has to fetch data row. fine, but why its soo slow?! and it
Because for each row it has to compute the aggregate key, find the aggregator for that key and increment the sum for that aggregate key. That's a lot more work than just selecting the row and doing nothing with it. > so, just with plain ascii file i get four times the speed i get with > sqlite. note that my c program will scale linearly with the size of > dataset (just like i see with sqlite). With anything related to computers, there are always tradeoffs - most commonly power for complexity, and flexibility for speed. Your C program *should* be faster than anything SQLite can do - it's simpler and more specific to the problem you're trying to solve. On the flip side, it'll never do anything other than what it already does - it can never be used to solve any other problem. > - what am i doing wrong here? Your expectations are too high, for starters. For reasons I will never understand, people expect a relational database to be a silver bullet that answers all questions instantaneously. RDBMSs are useful because they're flexible, but they're also not 100% optimal because they're flexible. That's the price you pay for being able to answer questions with SQL instead of C. The other problem you may be running into is lack of a clear problem definition. So far I've seen mention of a half dozen things which you think should perform better, but you haven't dug into any of them to find out *why* they're slow. An earlier poster helped identify disk I/O as a bottleneck. If from there you can prove that that's a bottleneck because SQLite is performing too much disk I/O, then you have cause to claim that SQLite is slow. Otherwise, all you can do is blame your disks. Until you get a clear picture of what you want, what you need and what's actually happening, however, you'll never be able to tell the difference. > - any suggestions on how to speed things up with sqlite? First, you must understand what SQLite is doing. Have you tried EXPLAINing the query that you're unhappy with to see what SQLite is going with it "under the hood"? Also, an earlier poster mentioned increasing your block size. How high did you raise it? I've seen significant performance increases with block sizes of 8k and 16k; just how much difference it makes seems to depend on the system in question, however. On some systems I've tested, 8k block sizes are faster than 16k block sizes; my guess is that all this comes down to the block size used by the file system and/or OS (and the size of the data itself can factor in there as well), though I haven't yet dug deep enough to be sure. From my experience though, in most cases, there are certainly gains to be had by using larger block sizes. > - is sqlite optimized to retrieve one row from the table? If there is an index on the column or columns referenced in your WHERE clause, yes. Moreso if the column or columns referenced contain unique values. Note however that this has nothing to do with SQLite - all relational databases (all databases with B-tree indexes, actually, not just RDBMSs) work this way. > i'm starting to think that having index and data in the same file is > not such a great idea when you want to perform range requests. Having the data and the index in the same file is irrelevant. It is entirely possible that SQLite might be using an index to satisfy your query, and entirely possible that it would be more optimal to execute the query without using the index, but the fact that the index and data are in the same file has nothing to do with it. Whether to use the index or not is an age-old problem that other RDBMSs solve using cost-based optimizers; I'm not sure if there's a solution for that in SQLite, as I believe query plan execution is rule-based (I'm not 100% sure about that though, someone please correct me if I'm wrong). If that's the case, then you'll need to try to find another way to write your query such that SQLite executes it more optimally. If I remember correctly, the old trick to make sure you didn't use an index was to change your WHERE clause slightly to something like: SELECT COUNT(*) FROM (SELECT a FROM data WHERE (a + 0) <= 18234721 GROUP BY a); This should force a table scan rather than an index scan. Try that and see how your queries perform; if things go faster, that means the index is hurting you, and you need to either drop it or rewrite your queries to avoid using it. -Tom