Re: [sqlite] Multi column ORDER BY across table peformance problem....

John Machin Tue, 21 Apr 2009 19:32:34 -0700

On 22/04/2009 10:48 AM, sorka wrote:
> Wow. Anybody? I figured this would be a simple question for the gurus on this
> board. Seriously, nobody knows a better way to do this?

Seriously, consider alternative possibilities

> 
> 
> sorka wrote:
>> This should be simple but apparently it isn't.
>>
>> I have two tables:
>>                 "CREATE TABLE showing ( "
>>                 "    showingId            INTEGER PRIMARY KEY, "
>>                 "    stationId              INTEGER, "
>>                 "    startTime              INTEGER, "
>>                 ") "
>> CREATE INDEX showing_startTime on showing(startTime);
>>
>>
>> AND
>>                 "CREATE TABLE channel ( "
>>                 "  channelId            INTEGER PRIMARY KEY, "
>>                 "  ChannelNumber   INTEGER, "
>>                 "  stationId            INTEGER, "
>>                 "  ) "
>> CREATE INDEX channel_ChannelNumber on channel(ChannelNumber);
>> CREATE INDEX channel_stationId on channel(stationId);
>>
>> When I do this select:
>> SELECT showing.startTime FROM showing JOIN channel ON showing.startTime >=
>> 1239230000 AND showing.stationId = channel.stationId ORDER BY
>> showing.startTime LIMIT 8;

SQLite is smart enough to recognise that the showing_startTime index 
satisfies the ORDER BY clause ... so it's just traversing that index, 
filtering out cases of bogus stationID [I presume that's the reason for 
having the JOIN at all], and stopping when it's found 8 results.

What is the result of:
SELECT COUNT(*) FROM showing JOIN channel ON showing.startTime >=
1239230000 AND showing.stationId = channel.stationId;
?
Is that 8 or 100000 or somewhere in between?

>>
>> I get back the correct 8 results in about 3 milliseconds. 
>>
>> If I throw a secondary order term in there say ChannelNumber:
>>
>> SELECT showing.startTime FROM showing JOIN channel ON showing.startTime >=
>> 1239230000 AND showing.stationId = channel.stationId ORDER BY
>> showing.startTime, channel.ChannelMajorNumber LIMIT 8;

What is channel.ChannelMajorNumber? It's not in the schema that you 
posted. People who might be interested in helping you might want to try 
running your queries prefaced with EXPLAIN and EXPLAIN QUERY PLAN (which 
doesn't need any data, just the CREATE statements). Perhaps they then 
give up.

Your tables contain only presumingly meaningless integers. [You have 
given us the full schema for each table, haven't you? There aren't any 
50 Kb BLOBs in there, are there?] So you wouldn't have a problem with 
making a zipped db with the two tables (no indexes) available on the web 
somewhere for people who wanted to experiment?

>>
>> It now takes over 120 seconds!!!!!!!
>>
>> I've tried various multi-column indices including one on
>> channel(stationId, ChannelNumber).

And was that index used by the query? Did you have ChannelNumber in the 
ORDER BY clause at the time, or something else?

>>
>> No difference.
>>
>> As far as I can tell, when ordering on columns that cross tables, sqlite
>> will bring in all the records that match the equality or inequality search
>> term on the first column before it does a secondary sort.

What any SQL query engine has to do is produce results that are the same 
as those obtained by this method:
1. Produce temporary result set ignoring the ORDER BY and the 
LIMIT/OFFSET. This will include all columns from the SELECT plus any 
extra that appear only in the ORDER BY.
2. Sort the results according to the ORDER BY.
3. Read the sort output, discarding columns that appear only in the 
ORDER BY, and filtering according to the LIMIT/OFFSET.

Is that what you mean?

>>
>> I have over 100,000 records in the showing table and about 100 records in
>> the channel table.
>>
>> Sqlite should be smart enough to do the secondary sort on ChannelNumber as
>> soon as it sees that the records coming back have a later startTime than
>> the previous one. i.e. Say the first 5 records have the same startTime and
>> different channel numbers. Then the next 5 records have a later start time
>> than the first. Sqlite should be smart enough to see this as the results
>> come back and do a secondary sort on ChannelNumber on the first 5 results
>> and then rinse and repeat.

What is "the secondary sort on ChannelNumber" that you imply is already 
being done, just not soon enough?

>>
>> What appears to be happening is that even though startTime is indexed,
>> sqlite is bringing all 100K records into memory sorted by time before it
>> starts to sort by channel number. 

Apart from that being one possible explanation of what does seem a long 
time, what grounds do you have for the above statement?

>>
>> Is there a way to get sqlite to do the right thing? If there was only a
>> way to have a multi-column index that included columns from different
>> tables. Oh wait, there is, it's called an intermediate table. However the
>> cost of doing this is pretty high for reasons I can't go into here. 
>>
>> Any ideas? Maybe I'm just doing something wrong and this should be simple.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Multi column ORDER BY across table peformance problem....

Reply via email to