> I'm intrigued. How do you get SQLite to use a multi-column index as it's
> primary key (i.e. B-tree hash)? Please elaborate.

Simply 

CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT 
'',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY 
(sCommunityID, sTransactionID))

> Have you investigated the following to see how the optimizer deals with 
> it?
> 
> SELECT sCommunityID, max(sTransactionID)
>   FROM TransactionList
>  GROUP BY sCommunityID
> HAVING sCommunityID IN ('a030600001bFi','a030600004KDy', 
> 'a030600008QhK')

I'm not fluent in the VDBE code, but ISTM (and execution time confirms) 
that it's doing a table scan first to do the GROUP BY. Then it 
creates a temp table with the values from the list, and does a join (I 
suppose).

> I think I'd normally err on keeping a filter table handy (temporary if
> you wish):
> 
> -- Initialise filter:
> CREATE TABLE filter (sCommunityID TEXT);
> CREATE INDEX filter_idx ON filter(sCommunityID);
> 
> -- Configure filter:
> INSERT INTO filter VALUES ('a030600001bFi');
> INSERT INTO filter VALUES ('a030600004KDy');
> INSERT INTO filter VALUES ('a030600008QhK');
> 
> -- Get your results:
> SELECT sCommunityID, max(sTransactionID)
>   FROM filter NATURAL INNER JOIN TransactionList
>  GROUP BY sCommunityID;
> 
> -- Clear filter ready for next time:
> DELETE FROM filter;

Thread safety is an issue here. Multiple threads may be doing this action. 
Also, by the (admittedly contorted) query I am using I can get the VDBE to 
create a temporary table for me and insert the data, taking a lot less 
time to do it than it would take to run all those queries.
 
> I feel fairly confident that that method would make good use of your
> existing index.

Yes, I'm sure too. See my comment about joining to the Community table 
(that still requires the list of values, but it's fast because of the way 
the query is constructed).

> You can use a nice sqlite3_bind_text() for those inserts :)

I'm using Python and PySQLite but I can do that through executemany.

> If you really, really wanted you could munge the filter table into a
> sequence of UNIONs, but it would be nowhere near as elegant. This method
> easily extends to any number of filtered sCommunityIDs, while anything
> in a single query is going to start getting "icky".

Yes, hence my question about the max size of queries. It wouldn't be a 
problem to create multiple big queries and run them, as the running time 
is still O(N). It's just how long the query engine takes to compile the 
SQL if it's very long. 
 
> Probably too late, but I've also found that indexing, grouping and
> joining are far faster on INT columns. If you could store sCommunityID
> as an INT...

Interesting. It is, indeed, way too late :-)
 
> Please correct me if any of the above is junk, I'm still learning :)

If we stop learning then someone should call the undertaker ;-)

Hugh

Reply via email to