*I'm leaving aside the rant that your first delete is not identical to
combination of the select and delete in the second approach and select
in second approach contains typos...*

But did you try to combine your insert and delete statements from the
second approach? This approach quicker because of exactly that -
select and delete statements are independent whereas in first approach
your select is executed again and again for each row in TABLE2. So
just make it like this:

DELETE
FROM
TABLE2
WHERE
ENTRY_ID NOT IN
  (SELECT T2.ENTRY_ID FROM
  TABLE2 T2 INNER JOIN TABLE1 T1 ON
  (T1.PATIENT_ID = T2.PATIENT_ID)
  WHERE julianday(T2.START_DATE, '+15 month') >
        julianday(T1.START_DATE)
  )

And about indexes: for this query index on (PATIENT_ID, START_DATE)
doesn't do any better than on (PATIENT_ID) because SQLite must to
check all rows with given PATIENT_ID anyway. And that is because
START_DATE is in the query inside function call to julianday(). Index
on 2 fields could help only if your condition was e.g. T2.START_DATE >
T1.START_DATE.
And in most cases there's no benefit creating index on 1 field
(PATIENT_ID) when there's already index on 2 fields (PATIENT_ID,
START_DATE) where PATIENT_ID is the first field.

Pavel

On Fri, Sep 18, 2009 at 4:46 PM, Bart Smissaert
<bart.smissa...@gmail.com> wrote:
> This is what I am dealing with:
> 2 tables with exactly the same schema (but could be slightly
> different, so can't put in same table):
>
> CREATE TABLE TABLE1(
>   [PATIENT_ID] INTEGER,
>   [ENTRY_ID] INTEGER PRIMARY KEY,
>   [READ_CODE] TEXT,
>   [ADDED_DATE] TEXT,
>   [START_DATE] TEXT)
>
> The last 2 date fields are in the format yyyy-mm-dd
>
> Again in both tables the following indexes:
> CREATE INDEX IDX_TABLE1_PATIENT_ID ON TABLE1(PATIENT_ID)
> CREATE INDEX IDX_TABLE1_PATIENT_ID_START_DATE ON TABLE1(PATIENT_ID, 
> START_DATE)
> CREATE INDEX IDX_TABLE1_PATIENT_ID_ENTRY_ID ON TABLE1(PATIENT_ID, ENTRY_ID)
> CREATE INDEX TABLE1_START_DATE ON TABLE1(START_DATE)
>
> Then some delete SQL's will be run so that in the end PATIENT_ID is
> unique in both tables.
>
> Then the SQL I was trying to improve:
>
> DELETE
> FROM
> TABLE2
> WHERE
> PATIENT_ID NOT IN (
>   SELECT
>   TABLE1.PATIENT_ID
>   FROM
>   TABLE1
>   WHERE
>   JULIANDAY(TABLE2.START_DATE, '-14 month') >
>   JULIANDAY(TABLE1.START_DATE) AND
>   TABLE1.PATIENT_ID = TABLE2.PATIENT_ID)
>
> The index it will use for this SQL is IDX_TABLE1_PATIENT_ID_ENTRY_ID
> and that surprises me as ENTRY_ID is not in the above SQL. If I drop
> that index then it will
> use IDX_TABLE1_PATIENT_ID and only if that is dropped as well will it
> use IDX_TABLE1_PATIENT_ID_START_DATE, which I thought would be the
> best, but turns
> out to be no better than the first. Whatever way I do this it is slow
> and I can do it a lot quicker by doing this:
>
> CREATE TEMP TABLE DATE_COMPARE_TEMP([E_ID] INTEGER)
>
> INSERT INTO DATE_COMPARE_TEMP (E_ID)
>   SELECT T2.ENTRY_ID FROM
>   TABLE2 T2 INNER JOIN TABLE1 T2 ON
>   (T1.PATIENT_ID = T2.PATIENT_ID)
>   WHERE julianday(T2.START_DATE, '+15 month') >
>         julianday(T1.START_DATE)
>
> CREATE INDEX IDX1_DATE_COMPARE_TEMP_E_ID ON DATE_COMPARE_TEMP(E_ID)
>
> analyze DATE_COMPARE_TEMP
>
> and then the delete SQL like this:
>
> DELETE
> FROM
> TABLE2
> WHERE
> ENTRY_ID NOT IN
>   (SELECT E_ID FROM DATE_COMPARE_TEMP)
>
> Although this involves a third temp table this method is about twice
> as fast as the first one.
> Maybe that is just the way it is and there just is no way to do this
> as fast without the intermediate temp table, but I just wondered.
> Again there is no problem here as the second method is simple and
> fast. Just trying to increase my understanding of SQLite and indexes.
>
>
> RBS
>
>
>
>
> On Fri, Sep 18, 2009 at 8:25 PM, Simon Slavin
> <slav...@hearsay.demon.co.uk> wrote:
>>
>> On 18 Sep 2009, at 4:49pm, bartsmissa...@blueyonder.co.uk wrote:
>>
>>>> Did something in the documentation make
>>>> you think SQLite wouldn't use a
>>>> multi-column index unless you forced it ?
>>>
>>> No, but I just noticed it didn't use the index I thought would be
>>> best. As
>>> it turned out it looks I was wrong in that that index didn't give the
>>> quickest result.
>>
>> Ah.  Okay, that's fine.  It can be quite surprising what's best.  Glad
>> you got a good result.
>>
>> Simon.
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to