On Fri, May 23, 2008 at 3:01 PM, Jeff Gibson <[EMAIL PROTECTED]> wrote:
> I'm sorry if this is an obvious question - I'm new to databases. I have > an application where the database is used to log a large number of > simulation events. The database is written once and read many times > (i.e., there are never any inserts or updates after database creation). > The three most interesting tables I have are: > > CREATE TABLE events (eid INTEGER PRIMARY KEY, time INTEGER, aid INTEGER, > subtype INTEGER); > > CREATE TABLE actions (aid INTEGER PRIMARY KEY, type INTEGER, seqnum > INTEGER, tid INTEGER, instid INTEGER); > > CREATE TABLE subtypes (type INTEGER, subtype INTEGER, name TEXT, verbose > INTEGER, PRIMARY KEY(type,subtype) ); > > The column names are such that columns in different tables with the same > name act as foreign keys. The largest (and most often queried) table is > events, and it can have many millions of entries. The actions table is > also large (about a fifth as big as events) and subtypes is very small > (dozens of entries). My application involves querying events many > times, but very common queries include events that match a particular > verbose value and/or a particular type value. This leads to queries > that have one or two joins, and such queries are substantially slower > than just a query on just the events table. > The question is, what can I do to speed up those queries? The > obvious answer would be to put type and verbose as columns in the events > table, but they would be redundant. Is that par for the course, or is > there some best practice I'm overlooking? > Thanks, > Jeff > Well, I'd like to note that in general, reporting databases are denormalized in order to improve performance. However, you shouldn't need to do that for a case this simple. Let's start by creating your database and having fun with EXPLAIN QUERY PLAN. I created the tables as you provided and did this: create view eventdetail as select e.eid as eid, e.time as time, a.aid as aid, a.seqnum as seqnum, a.tid as tid, a.instid as instid, s.type as type, s.subtype as subtype, s.name as name, s.verbose as verbose from events e join actions a on a.aid=e.aid join subtypes s on s.subtype = e.subtype; This makes life a lot easier. Besides, SQLite is extremely well-written and will handle this stuff beautifully. ( I noticed that you have a 'type' on both the 'actions' table and the 'subtypes' table. I assume that they are the same thing and used the version from subtypes. ) sqlite> explain query plan select * from eventdetail where type=123; orde from deta ---- ------------- ---- 0 0 TABLE events AS e 1 1 TABLE actions AS a USING PRIMARY KEY 2 2 TABLE subtypes AS s WITH INDEX sqlite_autoindex_subtypes_1 You can see here that we are table-scanning 'events'. This is bad. The solution here is to add an index so events can be searched by subtype: [[ create index events_subtype_ix on events(subtype); ]] sqlite> explain query plan select * from eventdetail where type=123; orde from deta ---- ------------- ---- 0 2 TABLE subtypes AS s WITH INDEX sqlite_autoindex_subtypes_1 1 0 TABLE events AS e WITH INDEX events_subtype_ix 2 1 TABLE actions AS a USING PRIMARY KEY As you can see here, SQLite is actually figuring out which subtypes have type=123, then looking up that subset of the 'events' table using the newly created index, then joining to 'actions' based on the 'aid' column. All in all, if there are very many different top-level types, the first one will only find a few subtypes (the term is 'high selectivity'). In general, EXPLAIN QUERY PLAN <whatever> will give you a good idea on what SQLite is doing to perform the requested actions. Index tweaking and ANALYZE (http://www.sqlite.org/lang_analyze.html) will enable you to filter out the amount of data SQLite has to consider when returning a resultset. -- -- Stevie-O Real programmers use COPY CON PROGRAM.EXE _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users