Re: [sqlite] COUNT() extremely slow first time!

Jim Wilcoxson Mon, 21 Feb 2011 08:35:30 -0800

On Mon, Feb 21, 2011 at 11:05 AM, Sven L <larvpo...@hotmail.se> wrote:
>
> Thank you for your detailed explanation!
> First, can you please tell me how to purge the cache in Windows 7? This could 
> be very useful for my tests!


Sorry, dunno for Windows.  On Mac OSX it is the purge command, in the
development tools.  On Linux, you do: echo 3 >
/prog/sys/vm/drop_caches

>
> I'm quite sure my database itself is not fragmented, since I have only 
> inserted data.

If you insert the data in primary key order, or you don't have any key
(SQLite generates one) it's probably not very fragmented.  But keep in
mind that while to you, things are happening sequentially as you add
records, inside SQLite, things are happening quite differently.  For
example, let's say you have 4 indexes on this database.  So for every
insert you do, there are 5 database pages being affected (one for the
data row, one for each index).  These pages will fill up at different
rates, depending on the size of your keys, and will be written to disk
at different times.  So you are very likely to have data pages and
various index pages all intermixed in your SQLite file.  Also with
multiple indexes, it's unlikely that every index will be in sorted
order, based on the records you are inserting.  So IMO, there's no
practical way to avoid fragmentation within the SQLite file.

> The file system is in good shape too; Windows reports 0% fragmentation. 
> Perhaps there is some other bottleneck, like disk performance in general 
> (this is a 5400 rpm)? Or antivirus-related CPU holds? (I did turn real-time 
> scanning off though.)
> I have even turned Windows Search off (and got a 20% performance gain!). My 
> 32-bit application is running under Windows 7 (64-bit). Could WOW64 have 
> something to do with this performance issue?
>
> The size of the database is 1,98 GB, with 1.5M rows. Couldn't it be that the 
> table scan simply is darn slow for huge tables?

How much free RAM do you have?  Windows being the pig that it is, I'm
guessing not much, unless your system has > 4GB of RAM.  For
comparison, you could write a program that just read through the file,
4K at a time (I think that's the default page size for SQLite on
Windows).  Starting with a cold cache, this is near the best time you
will ever see for count(*) in SQLite.

>
> In an ideal world the COUNT() would be performed on the primary key in RAM. 
> That's not possible? :P

Sure, if all of the primary key records are in either the SQLite cache
or OS cache.  If they aren't, you have seek time.

I just did a small test to compare count(*) with count(primary key).
Here is count(*):

sqlite> explain select count(*) from logs;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Trace          0     0     0                    00
1     Goto           0     8     0                    00
2     OpenRead       1     12    0     keyinfo(1,BINARY)  00
3     Count          1     1     0                    00
4     Close          1     0     0                    00
5     SCopy          1     2     0                    00
6     ResultRow      2     1     0                    00
7     Halt           0     0     0                    00
8     Transaction    0     0     0                    00
9     VerifyCookie   0     23    0                    00
10    TableLock      0     8     0     logs           00
11    Goto           0     2     0                    00

Here is count(primary key):

sqlite> explain select count(logid) from logs;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Trace          0     0     0                    00
1     Null           0     2     0                    00
2     Null           0     1     0                    00
3     Goto           0     14    0                    00
4     OpenRead       0     8     0     0              00
5     Rewind         0     9     0                    00
6     Rowid          0     3     0                    00
7     AggStep        0     3     1     count(1)       01
8     Next           0     6     0                    01
9     Close          0     0     0                    00
10    AggFinal       1     1     0     count(1)       00
11    SCopy          1     4     0                    00
12    ResultRow      4     1     0                    00
13    Halt           0     0     0                    00
14    Transaction    0     0     0                    00
15    VerifyCookie   0     23    0                    00
16    TableLock      0     8     0     logs           00
17    Goto           0     4     0                    00

It's not clear to me whether for count(*), SQLite is reading each data
page or reading the primary key index pages.  But when timed, count(*)
is 16x faster than count(logid) the first time, and 8x faster once all
pages are cached.  This is just what happens on my system (Mac OSX),
and may have nothing at all to do with the performance on yours.

Jim

>
> Thanks again!
>
>
>> Date: Mon, 21 Feb 2011 10:17:03 -0500
>> From: pri...@gmail.com
>> To: sqlite-users@sqlite.org
>> Subject: Re: [sqlite] COUNT() extremely slow first time!
>>
>> This is a common issue on the mailing list. The first time you do
>> count(*), SQLite (actually your OS) has to load data into memory.
>> Most OS's will keep the file in a buffer cache, so the 2nd count(*)
>> doesn't have to read from disk.
>>
>> Here's a timing from my own system, after a purge command to clear the
>> buffer cache:
>>
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> ----------
>> 734909
>>
>> real 0m0.580s
>> user 0m0.190s
>> sys 0m0.034s
>>
>> Same command again, with the file cached:
>>
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> ----------
>> 734909
>>
>> real 0m0.189s
>> user 0m0.165s
>> sys 0m0.019s
>>
>> This time is consistent no matter how many times I run it, because the
>> file is still cached. Doing a purge command to clear the cache and
>> re-running the query, we get:
>>
>> $ purge
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> ----------
>> 734909
>>
>> real 0m0.427s
>> user 0m0.175s
>> sys 0m0.024s
>>
>> On my system, there is not a huge difference, but it is consistent.
>> Now, if you have a fragmented file system, you will see a much larger
>> difference. There are many posts on the mailing list about both file
>> system fragmentation and logical fragmentation within the SQLite file
>> itself. Your first count(*) is subject to these fragmentation
>> effects, while your 2nd usually is not, because the file is in memory.
>>
>> Some people on the list believe fragmentation is an unimportant detail
>> you shouldn't worry about, because you have little control over it.
>> That may be true, but it's useful to understand how it can affect
>> performance. I think you are seeing this first hand.
>>
>> Jim
>> --
>> HashBackup: easy onsite and offsite Unix backup
>> http://www.hashbackup.com
>>
>>
>>
>> On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote:
>> >
>> > Same result :(
>> > Note that I have compiled SQLite with the following switches:
>> > SQLITE_ENABLE_STAT2
>> > SQLITE_THREADSAFE=2
>> >
>> > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the 
>> > engine has to traverse all columns and it might even return another value 
>> > if there are NULL-values...
>> >
>> > Also, this is quite interesting:
>> >
>> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items;
>> > 0|0|0|SCAN TABLE Items (~1000000 rows)
>> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items;
>> > sqlite>
>> >
>> > I would expect an index scan on the first statement. The second statement 
>> > tells me nada?!
>> >
>> > Thanks for your help!
>> >
>> >
>> >> From: slav...@bigfraud.org
>> >> Date: Mon, 21 Feb 2011 14:24:50 +0000
>> >> To: sqlite-users@sqlite.org
>> >> Subject: Re: [sqlite] COUNT() extremely slow first time!
>> >>
>> >>
>> >> On 21 Feb 2011, at 2:23pm, Sven L wrote:
>> >>
>> >> > SELECT COUNT(ItemID) FROM Items;
>> >> >
>> >> > This takes around 40 seconds the first time! WHY?!
>> >>
>> >> Try again, doing everything identically except that instead of the above 
>> >> line use
>> >>
>> >> SELECT COUNT(*) FROM Items;
>> >>
>> >> Simon.
>> >> _______________________________________________
>> >> sqlite-users mailing list
>> >> sqlite-users@sqlite.org
>> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> >
>> > _______________________________________________
>> > sqlite-users mailing list
>> > sqlite-users@sqlite.org
>> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> >
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] COUNT() extremely slow first time!

Reply via email to