Re: [sqlite] Large DB Performance Questions

Mark Drago Thu, 08 Jun 2006 10:07:13 -0700

On Thu, 2006-06-08 at 03:58 +0200, Michael Sizaki wrote:
> Hi Mark,
> 
> have you tried to do a VACUUM on the database?
> It helps a lot when it comes to the 'read ahead'
> feature of the database.
> 
> Michael


Michael,

Well, I just repeated my tests to see if vacuum would have any
noticeable improvement on the query times that I have been seeing.  Here
are my findings:

Cold Cache before vacuum: 217s
Warm Cache before vacuum: 1.6s
Vacuum time: 1314s = 21m54s
Cold Cache after vacuum:  206s
Warm Cache after vacuum:  1.6s

Of course, right after performing the vacuum the database would have
been cached.  So, I made sure to clear my file cache before running the
tests after the vacuum.  I can get the database in to the file cache by
just running 'dd if=log.db of=/dev/null' and that only takes 16 seconds.
But of course the file cache will only help this much when there is
enough memory in the machine to get the database in to it.  I guess
since my query is only using one of the indexes and is never reading the
table proper that just getting the part of the file that contains the
index in to memory would be just as good.

Thanks for the suggestion,
Mark.


> Mark Drago wrote:
> > Hello,
> > 
> > I'm writing a web cache and I want to use SQLite to store the log of all
> > of the accesses made through the web cache.  The idea is to install this
> > web cache in large institutions (1000-5000 workstations).  The log
> > database can grow in size very quickly and can reach in to the gigabytes
> > after just a few days.
> > 
> > Writing to the database is speedy enough that I haven't seen much of a
> > problem.  I collect the data for 1000 web requests and then insert them
> > all in a single transaction using a prepared statement.  This works
> > rather well.
> > 
> > The problem that I'm encountering has to do with generating reports on
> > the data in the log database.  SQLite is showing good performance on
> > some simple queries, but that is not the case once something more
> > advanced is involved, like an aggregate function for example.  More
> > over, once the SQLite file is cached in memory it is really quick.
> > However, I can't count on this file being cached at all when a user goes
> > to run the report.  So, I've been clearing my file cache before running
> > a test, and then running the same test again now that everything has
> > been loaded in to the cache.  Like I said, for most cases SQLite is
> > fine, but here is one example where it doesn't fare as well.
> > 
> > The system that I'm running these tests on is a P4 2.8GHz HT with 1 GB
> > of RAM running Fedora Core 5 and using SQLite version 3.3.3 (being as
> > that is what comes with FC5).  I'm doing my tests with a database that
> > is 732M in size and contains 1,280,881 records (the DB schema is
> > included below).
> > 
> > I clear the file cache by running the following command. I wait until it
> > consumes all of memory and then I kill it:
> > perl -e '@f[0..100000000]=0'
> > 
> > I'm running the tests by running the following script:
> > #!/bin/bash
> > echo "$1;" | sqlite3 log.db > /dev/null
> > 
> > The query I'm running is the following:
> > select count(host), host from log group by host;
> > 
> > The results include the first time the query is run (when the file is
> > not cached) and then the times of a few runs after that (when the file
> > is cached).
> > 
> > SQLite: 221.9s, 1.6s, 1.6s, 1.6s
> >  MySQL:   2.2s, 1.8s, 1.8s, 1.8s
> > 
> > The MySQL tests were done with the following script:
> > #!/bin/bash
> > mysql -u root --database=log -e "$1" > /dev/null
> > 
> > It is apparent that SQLite is reading the entire database off of the
> > disk and MySQL somehow is not.  The MySQL query cache is not in use on
> > this machine and MySQL does not claim very much memory for itself before
> > the test is conducted (maybe 30M).
> > 
> > I've tried looking in to the output from 'explain' to see if SQLite was
> > using the index that I have on the 'host' column, but I don't think it
> > is.  The output from 'explain' is included below.  Note that the
> > 'explain' output is from a different machine which is running SQLite
> > 3.3.5 compiled from source as the SQLite on FC5 kept Segfaulting when I
> > tried to use 'explain'.
> > 
> > Any information or ideas on how to speed up this query are greatly
> > appreciated.  The only un-implemented idea I have right now is to remove
> > some of the duplicated data from the schema in an attempt to reduce the
> > size of the average row in the table.  In some cases I can store just an
> > integer where I'm storing both the integer and a descriptive string
> > (category_name and category_no for example).  Some of the other
> > information in the schema holds data about things that are internal to
> > the web cache (profile*, ad*, etc.).
> > 
> > Thank you very much for any ideas,
> > Mark.
> > 
> > TABLE SCHEMA:
> > CREATE TABLE log(
> > log_no integer primary key,
> > add_dte datetime,
> > profile_name varchar(255),
> > workstation_ip integer,
> > workstation_ip_txt varchar(20),
> > verdict integer,
> > verdict_reason varchar(255),
> > category_name varchar(80),
> > category_no integer,
> > set_user_name varchar(255),
> > profile_zone varchar(40),
> > profile_zone_no integer,
> > author_user_name varchar(255),
> > workstation_name varchar(255),
> > workstation_group_name varchar(255),
> > profile_association varchar(255),
> > profile_association_no integer,
> > protocol varchar(40),
> > connection_type varchar(255),
> > connection_type_no integer,
> > host varchar(255),
> > url text,
> > ad_username varchar(255),
> > ad_groups text,
> > ad_domain varchar(255),
> > ad_workstation_name varchar(255),
> > ad_last_update_dte datetime);
> > 
> > INDEXES:
> > CREATE INDEX add_dte ON log (add_dte);
> > CREATE INDEX profile_name ON log(profile_name);
> > CREATE INDEX workstation_ip ON log(workstation_ip);
> > CREATE INDEX verdict ON log (verdict);
> > CREATE INDEX research_zone_no ON log(research_zone_no);
> > CREATE INDEX profile_zone_no ON log(profile_zone_no);
> > CREATE INDEX workstation_name ON log(workstation_name);
> > CREATE INDEX workstation_group_name ON log(workstation_group_name);
> > CREATE INDEX profile_association_no ON log(profile_association_no);
> > CREATE INDEX connection_type_no ON log(connection_type_no);
> > CREATE INDEX host ON log(host);
> > CREATE INDEX ad_username on log(ad_username);
> > CREATE INDEX ad_domain on log(ad_domain);
> > CREATE INDEX ad_workstation_name on log(ad_workstation_name);
> > 
> > EXPLAIN OUTPUT:
> > sqlite> explain select count(host), host from log group by host;
> > 0|Noop|0|0|
> > 1|MemInt|0|3|
> > 2|MemInt|0|2|
> > 3|Goto|0|16|
> > 4|MemInt|1|3|
> > 5|Return|0|0|
> > 6|IfMemPos|2|8|
> > 7|Return|0|0|
> > 8|AggFinal|0|1|count(1)
> > 9|MemLoad|0|0|
> > 10|MemLoad|1|0|
> > 11|Callback|2|0|
> > 12|Return|0|0|
> > 13|MemNull|1|0|
> > 14|MemNull|0|0|
> > 15|Return|0|0|
> > 16|Gosub|0|13|
> > 17|Goto|0|41|
> > 18|Integer|0|0|
> > 19|OpenRead|2|20|keyinfo(1,BINARY)
> > 20|SetNumColumns|2|2|
> > 21|Rewind|2|38|
> > 22|RowKey|2|0|
> > 23|IdxIsNull|0|37|
> > 24|Column|2|0|
> > 25|MemStore|5|0|
> > 26|MemLoad|4|0|
> > 27|Eq|512|32|collseq(BINARY)
> > 28|MemMove|4|5|
> > 29|Gosub|0|6|
> > 30|IfMemPos|3|40|
> > 31|Gosub|0|13|
> > 32|Column|2|0|
> > 33|AggStep|0|1|count(1)
> > 34|Column|2|0|
> > 35|MemStore|1|1|
> > 36|MemInt|1|2|
> > 37|Next|2|22|
> > 38|Close|2|0|
> > 39|Gosub|0|6|
> > 40|Halt|0|0|
> > 41|Transaction|0|0|
> > 42|VerifyCookie|0|19|
> > 43|Goto|0|18|
> > 44|Noop|0|0|
>

smime.p7s
Description: S/MIME cryptographic signature

Re: [sqlite] Large DB Performance Questions

Reply via email to