On Tue, 12 Apr 2005, Maksim Yevmenkin wrote: >Dear SQLite users, > >consider this > > [snip] > >it only took 4+ seconds to read, parse, perform hash table lookup and >sum the data. note that for unique 1417 keys it had to do hash lookup >and hash insert. > >so, just with plain ascii file i get four times the speed i get with >sqlite. note that my c program will scale linearly with the size of >dataset (just like i see with sqlite). > >so, > >- what am i doing wrong here?
Not managing your expectations. Try the test with another SQL database, and see what sort of speed you get. > >- any suggestions on how to speed things up with sqlite? Buy a faster machine. No, I'm not being facetious. > >- is sqlite optimized to retrieve one row from the table? When fetching that one row using an index, yes, that's what indexes are for. But when an index scan touches a significant proportion of the corresponding table, then it becomes less than optimal, due the reasons given previously in the thread: - Cache thrashing - Index + Table access - CPU overhead - Non-linear file access > >i'm starting to think that having index and data in the same file is >not such a great idea when you want to perform range requests. Unless the index and data are on different disks, you'll get no benefit by splitting them from the same file. > >thanks, >max Christian > >On Apr 12, 2005 11:33 AM, Maksim Yevmenkin <[EMAIL PROTECTED]> wrote: >> Gé, >> >> thanks for the suggestion. unfortunately it did not make any >> difference :( below is the results. as you can see it takes 7+ seconds >> to "group by" 333,392 records and i'm grouping by column on which i >> have index. again, i'm not a database guy, but i think that is slow. >> perhaps someone can comment if that is the best sqlite can do? >> >> >> sqlite> .schema data >> CREATE TABLE data( >> a INTEGER, >> b INTEGER, >> c CHAR, >> d INTEGER, >> e INTEGER, >> n1 FLOAT, >> n2 FLOAT >> ); >> CREATE INDEX data_by_a on data (a); >> >> >> sqlite> select count(*) from data; >> 333392 >> >> test-1.sql >> ------------ >> PRAGMA cache_size = 300000; >> PRAGMA cache_size; >> PRAGMA page_size; >> SELECT count(*) FROM (SELECT a FROM data WHERE a <= 18234721 GROUP BY a); >> >> test-2.sql >> ------------ >> PRAGMA cache_size; >> PRAGMA page_size; >> SELECT count(*) FROM (SELECT a FROM data WHERE a <= 18234721 GROUP BY a); >> >> > time sqlite3 db < test-1.sql >> 300000 >> 1024 >> 1417 >> 6.89u 0.33s 0:07.55 95.6% >> >> > time sqlite3 db < test-2.sql >> 2000 >> 1024 >> 1417 >> 6.91u 0.19s 0:07.39 96.0% >> >> > time sqlite3 db2048 < test-1.sql >> 300000 >> 2048 >> 1417 >> 6.80u 0.08s 0:07.32 93.9% >> >> > time sqlite3 db2048 < test-2.sql >> 2000 >> 2048 >> 1417 >> 6.77u 0.12s 0:07.10 97.0% >> >> > time sqlite3 db4096 < test-1.sql >> 300000 >> 4096 >> 1417 >> 6.80u 0.15s 0:07.21 96.3% >> >> > time sqlite3 db4096 < test-2.sql >> 2000 >> 4096 >> 1417 >> 6.79u 0.15s 0:07.15 97.0% >> >> > time sqlite3 db8192 < test-1.sql >> 300000 >> 8192 >> 1417 >> 6.70u 0.11s 0:07.01 97.1% >> >> > time sqlite3 db8192 < test-2.sql >> 2000 >> 8192 >> 1417 >> 6.73u 0.09s 0:07.01 97.2% >> >> thanks, >> max >> >> >> On Apr 12, 2005 7:10 AM, Gé Weijers <[EMAIL PROTECTED]> wrote: >> > Maksim, >> > >> > Some things you could try: >> > >> > 1) increase cache memory >> > >> > You may be causing a lot of cache misses if the size of the query result >> > is very large compared to the size of the cache. Index-based searches >> > can cause multiple reloads of the same page because of a lack of >> > locality in the cache. An index-less search will just load each page once. >> > >> > as an experiment, try 'PRAGMA cache_size = 300000', before you run the >> > query. 1 GB of ram should be able to support 300MB of cache. >> > >> > 2) use 8192-byte pages >> > >> > Larger pages seem to improve performance quite a bit, in my experience. >> > >> > Do 'PRAGMA page_size = 8192' before you create the database. >> > >> > Doing both may cause excessive memory use (200000 * 8K = ...). I've >> > never tried that. >> > >> > >> > Gé >> > >> > >> > Maksim Yevmenkin wrote: >> > >> > >Robert, >> > > >> > > >> > > >> > >>[snip] >> > >> >> > >> >> > >> >> > >>>i said i print these rows to /dev/null too in my perl code. plus the >> > >>>perl code does some other things such as joining these rows with other >> > >>>hashes and summing the numbers. >> > >>> >> > >>> >> > >>That's fine. I was merely trying to account for the 50% speed difference >> > >>between the two differing column tests, which has been accomplished. >> > >> >> > >> >> > >> >> > >>>>As for the temp table ... I haven't tried this, but isn't >> > >>>> >> > >>>> >> > >>>"temp" a reserved >> > >>> >> > >>> >> > >>>>word in SQLite? More importantly, you should be doing this >> > >>>> >> > >>>> >> > >>>statement inside >> > >>> >> > >>>yes, it is. i really want to create 'temporary table' in memory. i was >> > >>>really hoping it would speed things up. >> > >>> >> > >>> >> > >>I misread the statement, so ignore me on that part. However, 339,000 >> > >>rows >> > >>into a temporary in-memory table ... I tried some experiments locally >> > >>here >> > >>and none of them took more than 2 seconds to execute. Are you sure >> > >>you're >> > >>not using up all available memory, which is causing the system to hit the >> > >>swapfile? What does this same query look like when you drop the "temp" >> > >>from >> > >>the query? >> > >> >> > >> >> > > >> > >the system has 1G of ram. i was "monitoring" sqlite3 memory usage with >> > >'top'. the SIZE and RES did not exceed 30M. so i do not think the >> > >memory is the issue here. >> > > >> > > >> > > >> > >>time sqlite3 db 'create table foo as select * from data where a <= >> > >>18234721' > /dev/null >> > >> >> > >> >> > >22.06u 1.39s 0:27.75 84.5% >> > > >> > >so pretty much the same time without 'temp'. >> > > >> > >i'm starting to suspect disk. here is what i did. i created a separate >> > >database with only one table. this table contains subset of 333392 >> > >rows from original data table. it also has the same index on "a" >> > >column, i.e. i did >> > > >> > > >> > > >> > >>sqlite3 db1 >> > >> >> > >> >> > >sqlite> attach db as s; >> > >sqlite> create table data as select * from s.data where a <= 18234721; >> > >sqlite> create index data_by_a on data (a); >> > > >> > >full scan >> > > >> > > >> > > >> > >>time sqlite3 db1 'select n1 from data' > /dev/null >> > >> >> > >> >> > >17.19u 0.55s 0:19.06 93.0% >> > > >> > >"bad" index scan, because it is guaranteed then the table only has >> > >keys that match "where" >> > > >> > > >> > > >> > >>time sqlite3 db1 'select n1 from data where a <= 18234721' > /dev/null >> > >> >> > >> >> > >25.73u 0.59s 0:28.37 92.7% >> > > >> > >+10 seconds! is this the overhead of "indexed" scan? is this what it >> > >really takes to seek back and forth between index and data? what am i >> > >missing here? >> > > >> > >thanks, >> > >max >> > > >> > > >> > >> > >> > -- /"\ \ / ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL X - AGAINST MS ATTACHMENTS / \