Hello,

I am developing a small app for determining the available combinations
of reagents stored in a sqlite database where no two reagents have the
same conjugate. I have tried two approaches to this query, one has
very good performance and the other is very poor. I am trying to
understand why the second has such bad performance. Both queries
return the correct answer.

The first option creates temp tables populated from select statements,
then a cross join between all of the temp tables with a big where
clause to get just the rows where no conjugate appears more than once.
This query executes in < 2 seconds.

CREATE TEMP TABLE 'CD4'  AS SELECT Conjugate FROM Catalog WHERE
Species IN ('Human') AND Conjugate NOT IN ('UNLB', 'Purified') AND
ProductName='CD4';
CREATE TEMP TABLE 'CD19' AS SELECT Conjugate FROM Catalog WHERE
Species IN ('Human') AND Conjugate NOT IN ('UNLB', 'Purified') AND
ProductName='CD19';
CREATE TEMP TABLE 'CD20' AS SELECT Conjugate FROM Catalog WHERE
Species IN ('Human') AND Conjugate NOT IN ('UNLB', 'Purified') AND
ProductName='CD20';
CREATE TEMP TABLE 'CD21' AS SELECT Conjugate FROM Catalog WHERE
Species IN ('Human') AND Conjugate NOT IN ('UNLB', 'Purified') AND
ProductName='CD21';
CREATE TEMP TABLE 'CD22' AS SELECT Conjugate FROM Catalog WHERE
Species IN ('Human') AND Conjugate NOT IN ('UNLB', 'Purified') AND
ProductName='CD22';
CREATE TEMP TABLE 'CD23' AS SELECT Conjugate FROM Catalog WHERE
Species IN ('Human') AND Conjugate NOT IN ('UNLB', 'Purified') AND
ProductName='CD23';
CREATE TEMP TABLE 'CD24' AS SELECT Conjugate FROM Catalog WHERE
Species IN ('Human') AND Conjugate NOT IN ('UNLB', 'Purified') AND
ProductName='CD24';

SELECT count(*) FROM
'CD4', 'CD19', 'CD20', 'CD21', 'CD22', 'CD23', 'CD24'

WHERE 'CD4'.Conjugate <> 'CD19'.Conjugate AND 'CD4'.Conjugate <>
'CD20'.Conjugate AND 'CD4'.Conjugate <> 'CD21'.Conjugate AND
'CD4'.Conjugate <> 'CD22'.Conjugate AND 'CD4'.Conjugate <>
'CD23'.Conjugate AND 'CD4'.Conjugate <> 'CD24'.Conjugate AND
'CD19'.Conjugate <> 'CD20'.Conjugate AND 'CD19'.Conjugate <>
'CD21'.Conjugate AND 'CD19'.Conjugate <> 'CD22'.Conjugate AND
'CD19'.Conjugate <> 'CD23'.Conjugate AND 'CD19'.Conjugate <>
'CD24'.Conjugate AND 'CD20'.Conjugate <> 'CD21'.Conjugate AND
'CD20'.Conjugate <> 'CD22'.Conjugate AND 'CD20'.Conjugate <>
'CD23'.Conjugate AND 'CD20'.Conjugate <> 'CD24'.Conjugate AND
'CD21'.Conjugate <> 'CD22'.Conjugate AND 'CD21'.Conjugate <>
'CD23'.Conjugate AND 'CD21'.Conjugate <> 'CD24'.Conjugate AND
'CD22'.Conjugate <> 'CD23'.Conjugate AND 'CD22'.Conjugate <>
'CD24'.Conjugate AND 'CD23'.Conjugate <> 'CD24'.Conjugate

ORDER BY 'CD4'.Conjugate,'CD19'.Conjugate, 'CD20'.Conjugate,
'CD21'.Conjugate, 'CD22'.Conjugate, 'CD23'.Conjugate,
'CD24'.Conjugate;

DROP TABLE 'CD4';
DROP TABLE 'CD19';
DROP TABLE 'CD20';
DROP TABLE 'CD21';
DROP TABLE 'CD22';
DROP TABLE 'CD23';
DROP TABLE 'CD24';

The second approach uses nested select statements rather than temp
tables. The logic is the same as the query above, only that the temp
tables have been replaced by the equivalent select statements. The
performance of this query is horrible, executing in > 3 minutes.

SELECT count(*) FROM
  (SELECT Conjugate FROM Catalog WHERE Species IN ('Human') AND
Conjugate NOT IN ('UNLB', 'Purified') AND ProductName='CD4') AS 'CD4',
  (SELECT Conjugate FROM Catalog WHERE Species IN ('Human') AND
Conjugate NOT IN ('UNLB', 'Purified') AND ProductName='CD19') AS
'CD19',
  (SELECT Conjugate FROM Catalog WHERE Species IN ('Human') AND
Conjugate NOT IN ('UNLB', 'Purified') AND ProductName='CD20') AS
'CD20',
  (SELECT Conjugate FROM Catalog WHERE Species IN ('Human') AND
Conjugate NOT IN ('UNLB', 'Purified') AND ProductName='CD21') AS
'CD21',
  (SELECT Conjugate FROM Catalog WHERE Species IN ('Human') AND
Conjugate NOT IN ('UNLB', 'Purified') AND ProductName='CD22') AS
'CD22',
  (SELECT Conjugate FROM Catalog WHERE Species IN ('Human') AND
Conjugate NOT IN ('UNLB', 'Purified') AND ProductName='CD23') AS
'CD23',
  (SELECT Conjugate FROM Catalog WHERE Species IN ('Human') AND
Conjugate NOT IN ('UNLB', 'Purified') AND ProductName='CD24') AS
'CD24'

WHERE 'CD4'.Conjugate <> 'CD19'.Conjugate AND 'CD4'.Conjugate <>
'CD20'.Conjugate AND 'CD4'.Conjugate <> 'CD21'.Conjugate AND
'CD4'.Conjugate <> 'CD22'.Conjugate AND 'CD4'.Conjugate <>
'CD23'.Conjugate AND 'CD4'.Conjugate <> 'CD24'.Conjugate AND
'CD19'.Conjugate <> 'CD20'.Conjugate AND 'CD19'.Conjugate <>
'CD21'.Conjugate AND 'CD19'.Conjugate <> 'CD22'.Conjugate AND
'CD19'.Conjugate <> 'CD23'.Conjugate AND 'CD19'.Conjugate <>
'CD24'.Conjugate AND 'CD20'.Conjugate <> 'CD21'.Conjugate AND
'CD20'.Conjugate <> 'CD22'.Conjugate AND 'CD20'.Conjugate <>
'CD23'.Conjugate AND 'CD20'.Conjugate <> 'CD24'.Conjugate AND
'CD21'.Conjugate <> 'CD22'.Conjugate AND 'CD21'.Conjugate <>
'CD23'.Conjugate AND 'CD21'.Conjugate <> 'CD24'.Conjugate AND
'CD22'.Conjugate <> 'CD23'.Conjugate AND 'CD22'.Conjugate <>
'CD24'.Conjugate AND 'CD23'.Conjugate <> 'CD24'.Conjugate

ORDER BY 'CD4'.Conjugate, 'CD19'.Conjugate, 'CD20'.Conjugate,
'CD21'.Conjugate, 'CD22'.Conjugate, 'CD23'.Conjugate,
'CD24'.Conjugate;

I've observed these execution times using the sqlite command line
utility. Why is this? Is my index on the catalog table not being used
for some reason?

I would like to avoid creating temp tables in the DB because of issues
with the Qt SQLite driver I'm using which keeps open connections to
the DB and locks these temp tables during some operations.

Below is my database schema:

DROP TABLE IF EXISTS catalog;
CREATE TABLE catalog(
  PartNumber TEXT PRIMARY KEY,
  ProductName TEXT,
  Species TEXT,
  Conjugate TEXT,
  Allele TEXT,
  Format TEXT,
  Size TEXT,
  Application TEXT,
  ProductLine TEXT,
  Status TEXT,
  PageNumber TEXT,
  LotNumber TEXT,
  ExpirationDate TEXT,
  UserDefined INTEGER);

CREATE INDEX idx_cat ON catalog(Conjugate, PartNumber);

Thank You!
-Bryan
[EMAIL PROTECTED]
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to