Maksim, Some things you could try:
1) increase cache memory You may be causing a lot of cache misses if the size of the query result is very large compared to the size of the cache. Index-based searches can cause multiple reloads of the same page because of a lack of locality in the cache. An index-less search will just load each page once. as an experiment, try 'PRAGMA cache_size = 300000', before you run the query. 1 GB of ram should be able to support 300MB of cache. 2) use 8192-byte pages Larger pages seem to improve performance quite a bit, in my experience. Do 'PRAGMA page_size = 8192' before you create the database. Doing both may cause excessive memory use (200000 * 8K = ...). I've never tried that. Gé Maksim Yevmenkin wrote: >Robert, > > > >>[snip] >> >> >> >>>i said i print these rows to /dev/null too in my perl code. plus the >>>perl code does some other things such as joining these rows with other >>>hashes and summing the numbers. >>> >>> >>That's fine. I was merely trying to account for the 50% speed difference >>between the two differing column tests, which has been accomplished. >> >> >> >>>>As for the temp table ... I haven't tried this, but isn't >>>> >>>> >>>"temp" a reserved >>> >>> >>>>word in SQLite? More importantly, you should be doing this >>>> >>>> >>>statement inside >>> >>>yes, it is. i really want to create 'temporary table' in memory. i was >>>really hoping it would speed things up. >>> >>> >>I misread the statement, so ignore me on that part. However, 339,000 rows >>into a temporary in-memory table ... I tried some experiments locally here >>and none of them took more than 2 seconds to execute. Are you sure you're >>not using up all available memory, which is causing the system to hit the >>swapfile? What does this same query look like when you drop the "temp" from >>the query? >> >> > >the system has 1G of ram. i was "monitoring" sqlite3 memory usage with >'top'. the SIZE and RES did not exceed 30M. so i do not think the >memory is the issue here. > > > >>time sqlite3 db 'create table foo as select * from data where a <= 18234721' >>> /dev/null >> >> >22.06u 1.39s 0:27.75 84.5% > >so pretty much the same time without 'temp'. > >i'm starting to suspect disk. here is what i did. i created a separate >database with only one table. this table contains subset of 333392 >rows from original data table. it also has the same index on "a" >column, i.e. i did > > > >>sqlite3 db1 >> >> >sqlite> attach db as s; >sqlite> create table data as select * from s.data where a <= 18234721; >sqlite> create index data_by_a on data (a); > >full scan > > > >>time sqlite3 db1 'select n1 from data' > /dev/null >> >> >17.19u 0.55s 0:19.06 93.0% > >"bad" index scan, because it is guaranteed then the table only has >keys that match "where" > > > >>time sqlite3 db1 'select n1 from data where a <= 18234721' > /dev/null >> >> >25.73u 0.59s 0:28.37 92.7% > >+10 seconds! is this the overhead of "indexed" scan? is this what it >really takes to seek back and forth between index and data? what am i >missing here? > >thanks, >max > >