Re: [sqlite] schema optimization question

Stephen Oberholtzer Fri, 23 May 2008 13:44:59 -0700

On Fri, May 23, 2008 at 3:01 PM, Jeff Gibson <[EMAIL PROTECTED]> wrote:


> I'm sorry if this is an obvious question - I'm new to databases.  I have
> an application where the database is used to log a large number of
> simulation events.  The database is written once and read many times
> (i.e., there are never any inserts or updates after database creation).
> The three most interesting tables I have are:
>
> CREATE TABLE events (eid INTEGER PRIMARY KEY, time INTEGER, aid INTEGER,
> subtype INTEGER);
>
> CREATE TABLE actions (aid INTEGER PRIMARY KEY, type INTEGER, seqnum
> INTEGER, tid INTEGER, instid INTEGER);
>
> CREATE TABLE subtypes (type INTEGER, subtype INTEGER, name TEXT, verbose
> INTEGER, PRIMARY KEY(type,subtype) );
>
> The column names are such that columns in different tables with the same
> name act as foreign keys.  The largest (and most often queried) table is
> events, and it can have many millions of entries.  The actions table is
> also large (about a fifth as big as events) and subtypes is very small
> (dozens of entries).  My application involves querying events many
> times, but very common queries include events that match a particular
> verbose value and/or a particular type value.  This leads to queries
> that have one or two joins, and such queries are substantially slower
> than just a query on just the events table.
>    The question is, what can I do to speed up those queries?  The
> obvious answer would be to put type and verbose as columns in the events
> table, but they would be redundant.  Is that par for the course, or is
> there some best practice I'm overlooking?
>    Thanks,
>    Jeff
>

Well, I'd like to note that in general, reporting databases are denormalized
in order to improve performance.
However, you shouldn't need to do that for a case this simple.

Let's start by creating your database and having fun with EXPLAIN QUERY
PLAN.  I created the tables as you provided and did this:

create view eventdetail as select e.eid as eid, e.time as time, a.aid as
aid, a.seqnum as seqnum, a.tid as tid, a.instid as instid, s.type as type,
s.subtype as subtype, s.name as name, s.verbose as verbose from events e
join actions a on a.aid=e.aid join subtypes s on s.subtype = e.subtype;


This makes life a lot easier.  Besides, SQLite is extremely well-written and
will handle this stuff beautifully.
( I noticed that you have a 'type' on both the 'actions' table and the
'subtypes' table. I assume that they are the same thing and used the version
from subtypes. )

sqlite> explain query plan select * from eventdetail where type=123;
orde  from           deta
----  -------------  ----
0     0              TABLE events AS e
1     1              TABLE actions AS a USING PRIMARY KEY
2     2              TABLE subtypes AS s WITH INDEX
sqlite_autoindex_subtypes_1

You can see here that we are table-scanning 'events'.   This is bad.  The
solution here is to add an index so events can be searched by subtype:

[[ create index events_subtype_ix on events(subtype); ]]

sqlite> explain query plan select * from eventdetail where type=123;
orde  from           deta
----  -------------  ----
0     2              TABLE subtypes AS s WITH INDEX
sqlite_autoindex_subtypes_1
1     0              TABLE events AS e WITH INDEX events_subtype_ix
2     1              TABLE actions AS a USING PRIMARY KEY

As you can see here, SQLite is actually figuring out which subtypes have
type=123, then looking up that subset of the 'events' table using the newly
created index, then joining to 'actions' based on the 'aid' column.  All in
all, if there are very many different top-level types, the first one will
only find a few subtypes (the term is 'high selectivity').

In general, EXPLAIN QUERY PLAN <whatever> will give you a good idea on what
SQLite is doing to perform the requested actions.  Index tweaking and
ANALYZE (http://www.sqlite.org/lang_analyze.html) will enable you to filter
out the amount of data SQLite has to consider when returning a resultset.

-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] schema optimization question

Reply via email to