On Mon, Feb 21, 2011 at 11:05 AM, Sven L <larvpo...@hotmail.se> wrote: > > Thank you for your detailed explanation! > First, can you please tell me how to purge the cache in Windows 7? This could > be very useful for my tests!
Sorry, dunno for Windows. On Mac OSX it is the purge command, in the development tools. On Linux, you do: echo 3 > /prog/sys/vm/drop_caches > > I'm quite sure my database itself is not fragmented, since I have only > inserted data. If you insert the data in primary key order, or you don't have any key (SQLite generates one) it's probably not very fragmented. But keep in mind that while to you, things are happening sequentially as you add records, inside SQLite, things are happening quite differently. For example, let's say you have 4 indexes on this database. So for every insert you do, there are 5 database pages being affected (one for the data row, one for each index). These pages will fill up at different rates, depending on the size of your keys, and will be written to disk at different times. So you are very likely to have data pages and various index pages all intermixed in your SQLite file. Also with multiple indexes, it's unlikely that every index will be in sorted order, based on the records you are inserting. So IMO, there's no practical way to avoid fragmentation within the SQLite file. > The file system is in good shape too; Windows reports 0% fragmentation. > Perhaps there is some other bottleneck, like disk performance in general > (this is a 5400 rpm)? Or antivirus-related CPU holds? (I did turn real-time > scanning off though.) > I have even turned Windows Search off (and got a 20% performance gain!). My > 32-bit application is running under Windows 7 (64-bit). Could WOW64 have > something to do with this performance issue? > > The size of the database is 1,98 GB, with 1.5M rows. Couldn't it be that the > table scan simply is darn slow for huge tables? How much free RAM do you have? Windows being the pig that it is, I'm guessing not much, unless your system has > 4GB of RAM. For comparison, you could write a program that just read through the file, 4K at a time (I think that's the default page size for SQLite on Windows). Starting with a cold cache, this is near the best time you will ever see for count(*) in SQLite. > > In an ideal world the COUNT() would be performed on the primary key in RAM. > That's not possible? :P Sure, if all of the primary key records are in either the SQLite cache or OS cache. If they aren't, you have seek time. I just did a small test to compare count(*) with count(primary key). Here is count(*): sqlite> explain select count(*) from logs; addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Trace 0 0 0 00 1 Goto 0 8 0 00 2 OpenRead 1 12 0 keyinfo(1,BINARY) 00 3 Count 1 1 0 00 4 Close 1 0 0 00 5 SCopy 1 2 0 00 6 ResultRow 2 1 0 00 7 Halt 0 0 0 00 8 Transaction 0 0 0 00 9 VerifyCookie 0 23 0 00 10 TableLock 0 8 0 logs 00 11 Goto 0 2 0 00 Here is count(primary key): sqlite> explain select count(logid) from logs; addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Trace 0 0 0 00 1 Null 0 2 0 00 2 Null 0 1 0 00 3 Goto 0 14 0 00 4 OpenRead 0 8 0 0 00 5 Rewind 0 9 0 00 6 Rowid 0 3 0 00 7 AggStep 0 3 1 count(1) 01 8 Next 0 6 0 01 9 Close 0 0 0 00 10 AggFinal 1 1 0 count(1) 00 11 SCopy 1 4 0 00 12 ResultRow 4 1 0 00 13 Halt 0 0 0 00 14 Transaction 0 0 0 00 15 VerifyCookie 0 23 0 00 16 TableLock 0 8 0 logs 00 17 Goto 0 4 0 00 It's not clear to me whether for count(*), SQLite is reading each data page or reading the primary key index pages. But when timed, count(*) is 16x faster than count(logid) the first time, and 8x faster once all pages are cached. This is just what happens on my system (Mac OSX), and may have nothing at all to do with the performance on yours. Jim > > Thanks again! > > >> Date: Mon, 21 Feb 2011 10:17:03 -0500 >> From: pri...@gmail.com >> To: sqlite-users@sqlite.org >> Subject: Re: [sqlite] COUNT() extremely slow first time! >> >> This is a common issue on the mailing list. The first time you do >> count(*), SQLite (actually your OS) has to load data into memory. >> Most OS's will keep the file in a buffer cache, so the 2nd count(*) >> doesn't have to read from disk. >> >> Here's a timing from my own system, after a purge command to clear the >> buffer cache: >> >> $ time sqlite3 hb.db 'select count(*) from logs' >> -- Loading resources from /Users/jim/.sqliterc >> count(*) >> ---------- >> 734909 >> >> real 0m0.580s >> user 0m0.190s >> sys 0m0.034s >> >> Same command again, with the file cached: >> >> $ time sqlite3 hb.db 'select count(*) from logs' >> -- Loading resources from /Users/jim/.sqliterc >> count(*) >> ---------- >> 734909 >> >> real 0m0.189s >> user 0m0.165s >> sys 0m0.019s >> >> This time is consistent no matter how many times I run it, because the >> file is still cached. Doing a purge command to clear the cache and >> re-running the query, we get: >> >> $ purge >> $ time sqlite3 hb.db 'select count(*) from logs' >> -- Loading resources from /Users/jim/.sqliterc >> count(*) >> ---------- >> 734909 >> >> real 0m0.427s >> user 0m0.175s >> sys 0m0.024s >> >> On my system, there is not a huge difference, but it is consistent. >> Now, if you have a fragmented file system, you will see a much larger >> difference. There are many posts on the mailing list about both file >> system fragmentation and logical fragmentation within the SQLite file >> itself. Your first count(*) is subject to these fragmentation >> effects, while your 2nd usually is not, because the file is in memory. >> >> Some people on the list believe fragmentation is an unimportant detail >> you shouldn't worry about, because you have little control over it. >> That may be true, but it's useful to understand how it can affect >> performance. I think you are seeing this first hand. >> >> Jim >> -- >> HashBackup: easy onsite and offsite Unix backup >> http://www.hashbackup.com >> >> >> >> On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote: >> > >> > Same result :( >> > Note that I have compiled SQLite with the following switches: >> > SQLITE_ENABLE_STAT2 >> > SQLITE_THREADSAFE=2 >> > >> > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the >> > engine has to traverse all columns and it might even return another value >> > if there are NULL-values... >> > >> > Also, this is quite interesting: >> > >> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items; >> > 0|0|0|SCAN TABLE Items (~1000000 rows) >> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items; >> > sqlite> >> > >> > I would expect an index scan on the first statement. The second statement >> > tells me nada?! >> > >> > Thanks for your help! >> > >> > >> >> From: slav...@bigfraud.org >> >> Date: Mon, 21 Feb 2011 14:24:50 +0000 >> >> To: sqlite-users@sqlite.org >> >> Subject: Re: [sqlite] COUNT() extremely slow first time! >> >> >> >> >> >> On 21 Feb 2011, at 2:23pm, Sven L wrote: >> >> >> >> > SELECT COUNT(ItemID) FROM Items; >> >> > >> >> > This takes around 40 seconds the first time! WHY?! >> >> >> >> Try again, doing everything identically except that instead of the above >> >> line use >> >> >> >> SELECT COUNT(*) FROM Items; >> >> >> >> Simon. >> >> _______________________________________________ >> >> sqlite-users mailing list >> >> sqlite-users@sqlite.org >> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > >> > _______________________________________________ >> > sqlite-users mailing list >> > sqlite-users@sqlite.org >> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users