Hmm - My original pastebin is gone now (oops), so I reposted with the output 
from the sqlite3_ tracing functions (instead of our own performance logger) 
here: http://mozilla.pastebin.org/441814.  Let me know if you see any obvious 
indexes to add, or other obvious problems with how we're accessing the database.

Thanks again,
Mike

----- Original Message -----
From: "Mike Hays" <mi...@songbirdnest.com>
To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org>
Sent: Friday, July 30, 2010 2:25:27 PM
Subject: Re: [sqlite] Optimizing Songbird

Hi Richard,

Thanks for getting back to me so quickly! :)

You're absolutely right that changing memory/cache usage doesn't seem to affect 
performance much.  I've pastebin'd a performance log here: 
http://mozilla.pastebin.org/434277.  It definitely looks like we have some 
duplicate queries, which I've been looking into.  I think this should cover 
your first suggestion (and maybe the second?), but let me know if you need more 
info about what's going on in the Songbird code.  I'll be looking into 
sqlite3_stmt_status() and sqlite3_profile() over the next few days.

Thanks again,
Mike

----- Original Message -----
From: "Richard Hipp" <d...@sqlite.org>
To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org>
Sent: Friday, July 30, 2010 1:11:02 PM
Subject: Re: [sqlite] Optimizing Songbird

On Fri, Jul 30, 2010 at 3:26 PM, Mike Hays <mi...@songbirdnest.com> wrote:

> Hi sqlite-users,
>
> I'm working on improving the SQLite database performance for Songbird, an
> open-source media player based on Firefox/Mozilla.  We're seeing pretty poor
> performance with large media libraries - ideally we'd like to support
> 100,000 media items and maintain reasonable performance.  I'd really
> appreciate any help you can give (and so will our users)!
>
> For reference, you can see the majority of our code that interacts with the
> DB here:
> http://src.songbirdnest.com/source/xref/client/components/dbengine/src/DatabaseEngine.cpp
> And the schema is here:
> http://src.songbirdnest.com/source/xref/client/components/library/localdatabase/content/schema.sql
> The high level is that we store a GUID for each media item and some
> top-level properties in one table (media_items), and all the user-visible
> metadata (track name, artist, album, etc.) in another (resource_properties).
>  We use custom collation for things like sorting numbers correctly (so "2
> Live Crew" comes before "10,000 Maniacs") and localization.
>
> I've updated to 3.7.0 (which seems to have better start-up and shut-down
> times, yay!) and poked around for a few days, and discovered some areas I'd
> like some help investigating.  The first is memory usage - I'm pretty new to
> this stuff so bear with me ;)
>
> My test library has around 85,000 tracks, which ends up using about 300MB
> of DB.  I have pre-allocated a 16000 page cache, with 16KB pages.  My memory
> usage after using the DB for a while looks like:
> DumpMemoryStatistics() format   Current Highwater
> SQLITE_STATUS_MEMORY_USED:      264849264       560992624
> SQLITE_STATUS_PAGECACHE_USED:   8       15996
> SQLITE_STATUS_PAGECACHE_OVERFLOW:       264435640       560259280
> SQLITE_STATUS_SCRATCH_USED:     0       0
> SQLITE_STATUS_SCRATCH_OVERFLOW: 0       102664
> SQLITE_STATUS_MALLOC_SIZE       328     135807
> SQLITE_STATUS_PARSER_STACK      0       0
> SQLITE_STATUS_PAGECACHE_SIZE    16384   16384
> SQLITE_STATUS_SCRATCH_SIZE      80536   102664
>
> What surprises me is that the Highwater page-cache usage is nearly 100%,


That you have a massive page cache and that it is all being used sometimes
is a clue that you are doing full table scans rather than indexed queries.
Adding the right indices will likely make a huge difference.  Changing
database sizes and cache sizes probably won't make much difference at all.

Here are some things to try:

(1) Use the sqlite3_trace() interface to get a log of all SQL statements you
are running.  This log, together with the schema of your database file, is
useful to us in helping to determine which indices you need.

(2) Consider using sqlite3_profile() to figure out which queries are taking
time.  Focus on optimizing the slow ones.

(3) Read about the sqlite3_stmt_status() interface.  Just before you
sqlite3_finalize() or sqlite3_reset() each prepared statement, using
sqlite3_stmt_status() to see how many full-scan steps and how many sorts it
had to do.  If the answer to either question is non-zero, you found a
potential place to optimize using an index.

An example of (3) can be seen in the source code to Fossil in
http://www.fossil-scm.org/fossil/artifact/c5ac4be0c11b7b0b6da0abe7ccf1680e87698b19(search
for the symbol "db_stats").  If compiled with FOSSIL_DEBUG, an extra
"db_stats()" function exists which runs sqlite3_stmt_status() after each SQL
statement is evaluated and inserts warning message in the output if any
inefficient queries are run.  This is very helpful in finding inefficient
queries, even for small datasets - too small to notice the inefficiencies in
causal usage.




> but the Current usage is generally less than 10 pages.  Bumping the page
> size up to 32KB resulted in about twice as many pages in cache after some
> usage, but that still results in massive overflow (which I'm assuming is
> significantly slower).  My initial guess is that the majority of the
> accesses are just too large to fit in cache... Does this sound reasonable?
>  How would I go about correcting this?
>
> I'd also like some help with prepared statements and indexing.  We use some
> seemingly complex queries, and I'm having trouble understanding which
> indexes and prepared statements are actually useful to improve the
> performance of these queries.  Being relatively unfamiliar with SQL, some
> documentation is probably what I'm after at the moment.
>
> Lastly, if you have any suggestions on the schema we're using, that would
> probably be very helpful as well.
>
> Big thanks in advance :)
>
> -Mike
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
---------------------
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to