Re: [sqlite] Re: Query performance issues - index selection

Joe Wilson Sat, 03 Jun 2006 08:01:34 -0700

For my databases, GROUP BY is slower in recent SQLite releases
because my queries are not able to use indexed columns for 
GROUP BY items by design:


 http://www.mail-archive.com/sqlite-users%40sqlite.org/msg15563.html

I'm curious what sort of timings you'd get with SQLite 3.2.1.
http://sqlite.org/sqlite-3.2.1.tar.gz

To convert to the older database format:

 sqlite334 334.db .dump | sqlite321 321.db

And then perform your timings with both SQLite versions against 
the 321.db file to level the playing field.
Just having an index on utime should be optimal for SQLite 3.2.1.

--- Steve Green <[EMAIL PROTECTED]> wrote:
> Using Richard's suggestion of changing the where clause of my query
> to
> 
> where +utime >= 1146441600 and +utime < 114912000
> 
> did force sqlite to use the index that gave better performance.
> 
> However, I'm seeing some strange behavior that I'm hoping someone can
> shed some light on.
> 
> With the time period mentioned below, the data set is about 2.5 million
> rows and and 86K distinct u_id values.  Using the index on (u_id, utime),
> the query time was reduced from 13.5 minutes to 26 seconds (not great,
> but at least the browser won't timeout waiting for a response).
> 
> However, with a different time period, I have a much smaller data set of
> about 150K rows and 20K distinct u_id values.  Using the index on (u_id,
> utime), the query still takes about 20 seconds.  However, if the primary
> index on (utime, r_id, u_id) is used, the query only takes 0.5 seconds.
> Unfortunately at query time I have no idea of knowing how much data is
> going to have to be traversed, so the idea of modifying the query to force
> the use of different indexes is not possible.  Can anyone explain why the
> performance is so poor with the smaller data set and the "optimal" index.
> Any suggestions on a workaround so that optimal performance can be achieved
> with all data set sizes?
> 
> Thanks for your help,
> 
> Steve
> 
> Steve Green wrote:
> 
> > Sorry, I forgot to mention that I'm using sqlite v3.3.4 on redhat linux
> > v7.3
> > 
> > Steve
> > 
> > Steve Green wrote:
> > 
> >> Hi,
> >>
> >> I have the following schema
> >>
> >> CREATE TABLE data(
> >>     utime int4,
> >>     r_id int2,
> >>     u_id int4,
> >>     data1 int8,
> >>     data2 int8
> >> );
> >>
> >> Each row is uniquely defined by utime, r_id, and u_id, so I have the
> >> following index
> >>
> >> CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );
> >>
> >> This index is also needed because at update time, I use "insert or
> >> replace" to update the data table.
> >>
> >> The type of query that I'm attempting to perform is similar to
> >>
> >> select u_id, sum( data1 ), sum( data2 )
> >> from data where utime >= 1146441600 and utime < 1149120000
> >> group by u_id
> >> order by sum( data1 ) desc
> >> limit 10
> >>
> >> My current table has about 2.5 million rows and about 86,000 distinct
> >> u_id values in the time period selected, and the query takes about 13.5
> >> minutes.
> >>
> >> Performing an explain query plan reveals
> >>
> >> 0|0|TABLE data WITH INDEX pk_data
> >>
> >> so the primary index is being used...
> >>
> >> Based on some past experiences, I added the following index to the table
> >>
> >> CREATE INDEX ix_data_ut on data( u_id, utime );
> >>
> >> Note that the utime is the last parameter in the index.  With the primary
> >> index in place, I was not able to convince sqlite to use this index.  
> >> To test
> >> the index, I was forced to drop the primary index (which I can't do in my
> >> production environment).   After dropping the primary index, an 
> >> explain query
> >> plan revealed
> >>
> >> 0|0|TABLE data WITH INDEX ix_data_ut
> >>
> >> and the query ran in 26 seconds...
> >>
> >> Subsequent tests using the following indexes provided no performance 
> >> improvement
> >> over the unique index, although sqlite's query planner chose these 
> >> indexes over
> >> the unique index
> >>
> >> CREATE INDEX ix_data_tu on data( utime, u_id );
> >> CREATE INDEX ix_data_t on data( utime );
> >>
> >> So, is there a way that I can get sqlite to use the optimal index 
> >> without having
> >> to remove my unique index?  Perhaps this would involve rewritting the 
> >> query, but
> >> I'm at a loss as to how that could be done.
> >>
> >> Thanks for your time,
> >>
> >> Steve


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: [sqlite] Re: Query performance issues - index selection

Reply via email to