Have checked and missing analyze is indeed not the cause of this difference in query speed.
RBS On Tue, Jul 12, 2011 at 9:06 PM, Bart Smissaert <bart.smissa...@gmail.com> wrote: >> it thinks that scanning the whole ENTRY_ATTRIBUTES table will read >> about 54855 rows. And you say that it has much more rows. > > This particular database has less rows, the millions I mentioned are > in a different > database. I think the figures are right, but will check. > >> I think running ANALYZE on your database > > Analyze was done, but will double-check. > > RBS > > > On Tue, Jul 12, 2011 at 8:59 PM, Pavel Ivanov <paiva...@gmail.com> wrote: >>> Now what surprises me is that this optimization is not done >>> automatically by SQLite. >>> I suppose I just over estimate the capabilities of the SQLite plan >>> generator. >>> Or, would this be something that could be improved? >> >> It's very non-obvious optimization and I think other type of >> optimization will give much better result in your case. >> To understand the optimizer's behavior look at numbers it shows you: >> it thinks that scanning the whole ENTRY_ATTRIBUTES table will read >> about 54855 rows. And you say that it has much more rows. That's why >> optimizer selects sub-optimal plan. >> I think running ANALYZE on your database should fix selected plans and >> even first query will run much faster. >> >> >> Pavel >> >> >> On Tue, Jul 12, 2011 at 3:39 PM, Bart Smissaert >> <bart.smissa...@gmail.com> wrote: >>> Joining a large table (ENTRY_ATTRIBUTES) and a small table >>> (BPNewENTRY) and putting the resulting records >>> in a third table, BP3. Large table may have a few million records and >>> small table a few hundred records. >>> The join field is called ENTRY_ID in both tables and this has a >>> non-unique index in the large table and is the integer primary key >>> in the small table. Data type is integer in both these fields. >>> >>> This is the SQL: >>> >>> INSERT OR REPLACE INTO BP3 >>> (ENTRY_ID, NUMERIC_VALUE, UNIT) >>> SELECT EA.ENTRY_ID, EA.NUMERIC_VALUE, EA.TYPE_SPECIFIC_INFO >>> FROM >>> ENTRY_ATTRIBUTES EA INNER JOIN BPNewENTRY E ON >>> (EA.ENTRY_ID = E.ENTRY_ID) >>> >>> and this is the generated query plan for that: >>> >>> 0--0--0--SCAN TABLE ENTRY_ATTRIBUTES AS EA (~54855 rows) <--> >>> 0--1--1--SEARCH TABLE BPNewENTRY AS E USING INTEGER PRIMARY KEY >>> (rowid=?) (~1 rows) >>> >>> Now I can speed up the query a lot by putting a where clause in with >>> the min(ENTRY_ID) of the small table: >>> >>> INSERT OR REPLACE INTO BP3 >>> (ENTRY_ID, NUMERIC_VALUE, UNIT) >>> SELECT EA.ENTRY_ID, EA.NUMERIC_VALUE, EA.TYPE_SPECIFIC_INFO >>> FROM >>> ENTRY_ATTRIBUTES EA INNER JOIN BPNewENTRY E ON >>> (EA.ENTRY_ID = E.ENTRY_ID) >>> WHERE >>> EA.ENTRY_ID >= 4262936 >>> >>> and that will have this query plan: >>> >>> 0--0--0--SEARCH TABLE ENTRY_ATTRIBUTES AS EA USING INDEX >>> IDX_ENTRY_ATTRIBUTES_ENTRY_ID (ENTRY_ID>?) (~18102 rows) <--> >>> 0--1--1--SEARCH TABLE BPNewENTRY AS E USING INTEGER PRIMARY KEY >>> (rowid=?) (~1 rows) >>> >>> Getting this min(ENTRY_ID) is done in a separate query and because it >>> is done on the small table it is very fast. >>> >>> Now what surprises me is that this optimization is not done >>> automatically by SQLite. >>> I suppose I just over estimate the capabilities of the SQLite plan >>> generator. >>> Or, would this be something that could be improved? >>> >>> I am using SQLite version 3.7.5. >>> >>> >>> RBS >>> _______________________________________________ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users