> with sum(n1) added query runs twice as slow. as i was told its because
> sqlite has to fetch data row. fine, but why its soo slow?! and it

   Because for each row it has to compute the aggregate key, find the
aggregator for that key and increment the sum for that aggregate key.
That's a lot more work than just selecting the row and doing nothing
with it.

> so, just with plain ascii file i get four times the speed i get with
> sqlite. note that my c program will scale linearly with the size of
> dataset (just like i see with sqlite).

   With anything related to computers, there are always tradeoffs - most
commonly power for complexity, and flexibility for speed.  Your C
program *should* be faster than anything SQLite can do - it's simpler
and more specific to the problem you're trying to solve.  On the flip
side, it'll never do anything other than what it already does - it can
never be used to solve any other problem.

> - what am i doing wrong here?

   Your expectations are too high, for starters.  For reasons I will
never understand, people expect a relational database to be a silver
bullet that answers all questions instantaneously.  RDBMSs are useful
because they're flexible, but they're also not 100% optimal because
they're flexible.  That's the price you pay for being able to answer
questions with SQL instead of C.
   The other problem you may be running into is lack of a clear problem
definition.  So far I've seen mention of a half dozen things which you
think should perform better, but you haven't dug into any of them to
find out *why* they're slow.  An earlier poster helped identify disk I/O
as a bottleneck.  If from there you can prove that that's a bottleneck
because SQLite is performing too much disk I/O, then you have cause to
claim that SQLite is slow.  Otherwise, all you can do is blame your
disks.  Until you get a clear picture of what you want, what you need
and what's actually happening, however, you'll never be able to tell the
difference.

> - any suggestions on how to speed things up with sqlite?

   First, you must understand what SQLite is doing.  Have you tried
EXPLAINing the query that you're unhappy with to see what SQLite is
going with it "under the hood"?

   Also, an earlier poster mentioned increasing your block size.  How
high did you raise it?  I've seen significant performance increases with
block sizes of 8k and 16k; just how much difference it makes seems to
depend on the system in question, however.  On some systems I've tested,
8k block sizes are faster than 16k block sizes; my guess is that all
this comes down to the block size used by the file system and/or OS (and
the size of the data itself can factor in there as well), though I
haven't yet dug deep enough to be sure.  From my experience though, in
most cases, there are certainly gains to be had by using larger block
sizes.

> - is sqlite optimized to retrieve one row from the table?

   If there is an index on the column or columns referenced in your
WHERE clause, yes.  Moreso if the column or columns referenced contain
unique values.  Note however that this has nothing to do with SQLite -
all relational databases (all databases with B-tree indexes, actually,
not just RDBMSs) work this way.

> i'm starting to think that having index and data in the same file is
> not such a great idea when you want to perform range requests.

   Having the data and the index in the same file is irrelevant.  It is
entirely possible that SQLite might be using an index to satisfy your
query, and entirely possible that it would be more optimal to execute
the query without using the index, but the fact that the index and data
are in the same file has nothing to do with it.  Whether to use the
index or not is an age-old problem that other RDBMSs solve using
cost-based optimizers; I'm not sure if there's a solution for that in
SQLite, as I believe query plan execution is rule-based (I'm not 100%
sure about that though, someone please correct me if I'm wrong).  If
that's the case, then you'll need to try to find another way to write
your query such that SQLite executes it more optimally.

   If I remember correctly, the old trick to make sure you didn't use an
index was to change your WHERE clause slightly to something like:

   SELECT COUNT(*) FROM (SELECT a FROM data WHERE (a + 0) <= 18234721
GROUP BY a);

   This should force a table scan rather than an index scan.  Try that
and see how your queries perform; if things go faster, that means the
index is hurting you, and you need to either drop it or rewrite your
queries to avoid using it.

   -Tom

Reply via email to