This is what I am dealing with:
2 tables with exactly the same schema (but could be slightly
different, so can't put in same table):

CREATE TABLE TABLE1(
   [PATIENT_ID] INTEGER,
   [ENTRY_ID] INTEGER PRIMARY KEY,
   [READ_CODE] TEXT,
   [ADDED_DATE] TEXT,
   [START_DATE] TEXT)

The last 2 date fields are in the format yyyy-mm-dd

Again in both tables the following indexes:
CREATE INDEX IDX_TABLE1_PATIENT_ID ON TABLE1(PATIENT_ID)
CREATE INDEX IDX_TABLE1_PATIENT_ID_START_DATE ON TABLE1(PATIENT_ID, START_DATE)
CREATE INDEX IDX_TABLE1_PATIENT_ID_ENTRY_ID ON TABLE1(PATIENT_ID, ENTRY_ID)
CREATE INDEX TABLE1_START_DATE ON TABLE1(START_DATE)

Then some delete SQL's will be run so that in the end PATIENT_ID is
unique in both tables.

Then the SQL I was trying to improve:

DELETE
FROM
TABLE2
WHERE
PATIENT_ID NOT IN (
   SELECT
   TABLE1.PATIENT_ID
   FROM
   TABLE1
   WHERE
   JULIANDAY(TABLE2.START_DATE, '-14 month') >
   JULIANDAY(TABLE1.START_DATE) AND
   TABLE1.PATIENT_ID = TABLE2.PATIENT_ID)

The index it will use for this SQL is IDX_TABLE1_PATIENT_ID_ENTRY_ID
and that surprises me as ENTRY_ID is not in the above SQL. If I drop
that index then it will
use IDX_TABLE1_PATIENT_ID and only if that is dropped as well will it
use IDX_TABLE1_PATIENT_ID_START_DATE, which I thought would be the
best, but turns
out to be no better than the first. Whatever way I do this it is slow
and I can do it a lot quicker by doing this:

CREATE TEMP TABLE DATE_COMPARE_TEMP([E_ID] INTEGER)

INSERT INTO DATE_COMPARE_TEMP (E_ID)
   SELECT T2.ENTRY_ID FROM
   TABLE2 T2 INNER JOIN TABLE1 T2 ON
   (T1.PATIENT_ID = T2.PATIENT_ID)
   WHERE julianday(T2.START_DATE, '+15 month') >
         julianday(T1.START_DATE)

CREATE INDEX IDX1_DATE_COMPARE_TEMP_E_ID ON DATE_COMPARE_TEMP(E_ID)

analyze DATE_COMPARE_TEMP

and then the delete SQL like this:

DELETE
FROM
TABLE2
WHERE
ENTRY_ID NOT IN
   (SELECT E_ID FROM DATE_COMPARE_TEMP)

Although this involves a third temp table this method is about twice
as fast as the first one.
Maybe that is just the way it is and there just is no way to do this
as fast without the intermediate temp table, but I just wondered.
Again there is no problem here as the second method is simple and
fast. Just trying to increase my understanding of SQLite and indexes.


RBS




On Fri, Sep 18, 2009 at 8:25 PM, Simon Slavin
<slav...@hearsay.demon.co.uk> wrote:
>
> On 18 Sep 2009, at 4:49pm, bartsmissa...@blueyonder.co.uk wrote:
>
>>> Did something in the documentation make
>>> you think SQLite wouldn't use a
>>> multi-column index unless you forced it ?
>>
>> No, but I just noticed it didn't use the index I thought would be
>> best. As
>> it turned out it looks I was wrong in that that index didn't give the
>> quickest result.
>
> Ah.  Okay, that's fine.  It can be quite surprising what's best.  Glad
> you got a good result.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to