Robert, > [snip] > > > i said i print these rows to /dev/null too in my perl code. plus the > > perl code does some other things such as joining these rows with other > > hashes and summing the numbers. > > That's fine. I was merely trying to account for the 50% speed difference > between the two differing column tests, which has been accomplished. > > > > As for the temp table ... I haven't tried this, but isn't > > "temp" a reserved > > > word in SQLite? More importantly, you should be doing this > > statement inside > > > > yes, it is. i really want to create 'temporary table' in memory. i was > > really hoping it would speed things up. > > I misread the statement, so ignore me on that part. However, 339,000 rows > into a temporary in-memory table ... I tried some experiments locally here > and none of them took more than 2 seconds to execute. Are you sure you're > not using up all available memory, which is causing the system to hit the > swapfile? What does this same query look like when you drop the "temp" from > the query?
the system has 1G of ram. i was "monitoring" sqlite3 memory usage with 'top'. the SIZE and RES did not exceed 30M. so i do not think the memory is the issue here. > time sqlite3 db 'create table foo as select * from data where a <= 18234721' > > /dev/null 22.06u 1.39s 0:27.75 84.5% so pretty much the same time without 'temp'. i'm starting to suspect disk. here is what i did. i created a separate database with only one table. this table contains subset of 333392 rows from original data table. it also has the same index on "a" column, i.e. i did > sqlite3 db1 sqlite> attach db as s; sqlite> create table data as select * from s.data where a <= 18234721; sqlite> create index data_by_a on data (a); full scan > time sqlite3 db1 'select n1 from data' > /dev/null 17.19u 0.55s 0:19.06 93.0% "bad" index scan, because it is guaranteed then the table only has keys that match "where" > time sqlite3 db1 'select n1 from data where a <= 18234721' > /dev/null 25.73u 0.59s 0:28.37 92.7% +10 seconds! is this the overhead of "indexed" scan? is this what it really takes to seek back and forth between index and data? what am i missing here? thanks, max