Joining a large table (ENTRY_ATTRIBUTES) and a small table
(BPNewENTRY) and putting the resulting records
in a third table, BP3. Large table may have a few million records and
small table a few hundred records.
The join field is called ENTRY_ID in both tables and this has a
non-unique index in the large table and is the integer primary key
in the small table. Data type is integer in both these fields.

This is the SQL:

INSERT OR REPLACE INTO BP3
(ENTRY_ID, NUMERIC_VALUE, UNIT)
SELECT EA.ENTRY_ID, EA.NUMERIC_VALUE, EA.TYPE_SPECIFIC_INFO
FROM
ENTRY_ATTRIBUTES EA INNER JOIN BPNewENTRY E ON
(EA.ENTRY_ID = E.ENTRY_ID)

and this is the generated query plan for that:

0--0--0--SCAN TABLE ENTRY_ATTRIBUTES AS EA (~54855 rows) <-->
0--1--1--SEARCH TABLE BPNewENTRY AS E USING INTEGER PRIMARY KEY
(rowid=?) (~1 rows)

Now I can speed up the query a lot by putting a where clause in with
the min(ENTRY_ID) of the small table:

INSERT OR REPLACE INTO BP3
(ENTRY_ID, NUMERIC_VALUE, UNIT)
SELECT EA.ENTRY_ID, EA.NUMERIC_VALUE, EA.TYPE_SPECIFIC_INFO
FROM
ENTRY_ATTRIBUTES EA INNER JOIN BPNewENTRY E ON
(EA.ENTRY_ID = E.ENTRY_ID)
WHERE
EA.ENTRY_ID >= 4262936

and that will have this query plan:

0--0--0--SEARCH TABLE ENTRY_ATTRIBUTES AS EA USING INDEX
IDX_ENTRY_ATTRIBUTES_ENTRY_ID (ENTRY_ID>?) (~18102 rows) <-->
0--1--1--SEARCH TABLE BPNewENTRY AS E USING INTEGER PRIMARY KEY
(rowid=?) (~1 rows)

Getting this min(ENTRY_ID) is done in a separate query and because it
is done on the small table it is very fast.

Now what surprises me is that this optimization is not done
automatically by SQLite.
I suppose I just over estimate the capabilities of the SQLite plan generator.
Or, would this be something that could be improved?

I am using SQLite version 3.7.5.


RBS
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to