Martin O'Leary <[EMAIL PROTECTED]> wrote: > Hi guys, > > I have a table like the following: > > CREATE TABLE user_actions ( > uid INTEGER NOT NULL, > actionid INTEGER NOT NULL, > time INTEGER NOT NULL, > status INTEGER NOT NULL, > PRIMARY KEY (uid, actionid, time, status) > ); > > And I want to carry out a query something like this: > > SELECT uid, actionid, MAX(time), status FROM user_actions GROUP BY > uid, actionid;
Are you thinking that the value of "status" returned will be the one which has the maximum value for "time"? SQL doesn't work that way. To understand why not, consider this query: SELECT uid, actionid, MAX(time), MIN(time), status FROM user_actions GROUP BY uid, actionid; > > Here, I find two problems. Firstly, because my table doesn't have an > INTEGER PRIMARY KEY, I get an autogenerated index on my compound key. > No problem there. However, and I may be missing something, it seems > that there's a lot of data duplication going on. All the data from my > main table is available in the index (status shouldn't really be part > of the key, but I added it in order to increase query performance). As > far as I can tell, this means that the main table is never consulted, > but just sits there, doubling the size of my database. Is there any > way around this? All data is duplicated - it appears in both the table and in the index. There is no way around that. > > Secondly, query performance is quite slow. It seems to me that no > optimisation is being carried out on the MAX(time) expression. Is this > the case, and if so, why not? Surely it's possible to do this in a > nice, logarithmic way. > The query as you have specified it runs in O(NlogN) time where N is the number of rows in the table. Perhaps an enterprise-class RDBMS with a really big and really expensive query optimizer can do better, but SQLite isn't that smart. I suggest you do the query "manually". To begin with, use SQLite version 3.3.0 (which supports DESC indices) and create your table with a separate index like this: CREATE TABLE user_actions ( uid INTEGER NOT NULL, actionid INTEGER NOT NULL, time INTEGER NOT NULL, status INTEGER NOT NULL ); CREATE INDEX user_actions_pk ON user_actions(uid,actionid,time DESC,status); The DESC attribute on the time field of the index means that index entries will be in descending time order instead of the default ascending. Hence, the first entry in the index for a particular uid+actionid will be the one with the largest time value. Find the first record in O(logN) time like this: SELECT * FROM user_actions ORDER BY uid, actionid, time DESC LIMIT 1; Find all other actions for the same user this way: SELECT * FROM user_actions WHERE uid=:previous_uid AND actionid>:previous_actionid ORDER BY uid, actionid, time DESC LIMIT 1; Once you have all records for a single user, advance to the next user this way: SELECT * FROM user_actions WHERE uid>:previous_uid ORDER BY uid, actionid, time DESC LIMIT 1 Repeat until done. Runtime is O(MlogN) where N is the number of rows in the table and M is the number of rows of output. Since M is likely much less than N, this approach will be much faster. -- D. Richard Hipp <[EMAIL PROTECTED]>