Re: [sqlite] Re: Query performance issues - index selection

Joe Wilson Sat, 03 Jun 2006 08:15:52 -0700

If you test against 3.2.1, just use your original where clause 
and original index pk_data:


 where utime >= 1146441600 and utime < 1149120000

as I don't know what effect the unary +'s would have on it.

--- Joe Wilson <[EMAIL PROTECTED]> wrote:
> For my databases, GROUP BY is slower in recent SQLite releases
> because my queries are not able to use indexed columns for 
> GROUP BY items by design:
> 
>  http://www.mail-archive.com/sqlite-users%40sqlite.org/msg15563.html
> 
> I'm curious what sort of timings you'd get with SQLite 3.2.1.
> http://sqlite.org/sqlite-3.2.1.tar.gz
> 
> To convert to the older database format:
> 
>  sqlite334 334.db .dump | sqlite321 321.db
> 
> And then perform your timings with both SQLite versions against 
> the 321.db file to level the playing field.
> Just having an index on utime should be optimal for SQLite 3.2.1.
> 
> --- Steve Green <[EMAIL PROTECTED]> wrote:
> > Using Richard's suggestion of changing the where clause of my query
> > to
> > 
> > where +utime >= 1146441600 and +utime < 114912000
> > 
> > did force sqlite to use the index that gave better performance.
> > 
> > However, I'm seeing some strange behavior that I'm hoping someone can
> > shed some light on.
> > 
> > With the time period mentioned below, the data set is about 2.5 million
> > rows and and 86K distinct u_id values.  Using the index on (u_id, utime),
> > the query time was reduced from 13.5 minutes to 26 seconds (not great,
> > but at least the browser won't timeout waiting for a response).
> > 
> > However, with a different time period, I have a much smaller data set of
> > about 150K rows and 20K distinct u_id values.  Using the index on (u_id,
> > utime), the query still takes about 20 seconds.  However, if the primary
> > index on (utime, r_id, u_id) is used, the query only takes 0.5 seconds.
> > Unfortunately at query time I have no idea of knowing how much data is
> > going to have to be traversed, so the idea of modifying the query to force
> > the use of different indexes is not possible.  Can anyone explain why the
> > performance is so poor with the smaller data set and the "optimal" index.
> > Any suggestions on a workaround so that optimal performance can be achieved
> > with all data set sizes?
> > 
> > Thanks for your help,
> > 
> > Steve
> > 
> > Steve Green wrote:
> > 
> > > Sorry, I forgot to mention that I'm using sqlite v3.3.4 on redhat linux
> > > v7.3
> > > 
> > > Steve
> > > 
> > > Steve Green wrote:
> > > 
> > >> Hi,
> > >>
> > >> I have the following schema
> > >>
> > >> CREATE TABLE data(
> > >>     utime int4,
> > >>     r_id int2,
> > >>     u_id int4,
> > >>     data1 int8,
> > >>     data2 int8
> > >> );
> > >>
> > >> Each row is uniquely defined by utime, r_id, and u_id, so I have the
> > >> following index
> > >>
> > >> CREATE UNIQUE INDEX pk_data on data( utime, r_id, u_id );
> > >>
> > >> This index is also needed because at update time, I use "insert or
> > >> replace" to update the data table.
> > >>
> > >> The type of query that I'm attempting to perform is similar to
> > >>
> > >> select u_id, sum( data1 ), sum( data2 )
> > >> from data where utime >= 1146441600 and utime < 1149120000
> > >> group by u_id
> > >> order by sum( data1 ) desc
> > >> limit 10
> > >>
> > >> My current table has about 2.5 million rows and about 86,000 distinct
> > >> u_id values in the time period selected, and the query takes about 13.5
> > >> minutes.
> > >>
> > >> Performing an explain query plan reveals
> > >>
> > >> 0|0|TABLE data WITH INDEX pk_data
> > >>
> > >> so the primary index is being used...
> > >>
> > >> Based on some past experiences, I added the following index to the table
> > >>
> > >> CREATE INDEX ix_data_ut on data( u_id, utime );
> > >>
> > >> Note that the utime is the last parameter in the index.  With the primary
> > >> index in place, I was not able to convince sqlite to use this index.  
> > >> To test
> > >> the index, I was forced to drop the primary index (which I can't do in my
> > >> production environment).   After dropping the primary index, an 
> > >> explain query
> > >> plan revealed
> > >>
> > >> 0|0|TABLE data WITH INDEX ix_data_ut
> > >>
> > >> and the query ran in 26 seconds...
> > >>
> > >> Subsequent tests using the following indexes provided no performance 
> > >> improvement
> > >> over the unique index, although sqlite's query planner chose these 
> > >> indexes over
> > >> the unique index
> > >>
> > >> CREATE INDEX ix_data_tu on data( utime, u_id );
> > >> CREATE INDEX ix_data_t on data( utime );
> > >>
> > >> So, is there a way that I can get sqlite to use the optimal index 
> > >> without having
> > >> to remove my unique index?  Perhaps this would involve rewritting the 
> > >> query, but
> > >> I'm at a loss as to how that could be done.
> > >>
> > >> Thanks for your time,
> > >>
> > >> Steve


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: [sqlite] Re: Query performance issues - index selection

Reply via email to