D. Richard Hipp wrote: > Dennis Cote wrote: >> Hi All, >> >> I have a situation where populating data in a temp table takes a >> long time, about 8 minutes, and I am wondering if anyone can offer >> any suggestions for ways to speed it up. >> >> I have the permanent table and associated index, shown below, which >> holds tree structured component data. Each node is related to its >> parent by the container_id field. >> >> create table Component ( >> component_id integer primary key, >> component_kind_id integer not null references >> Component_Kind, component_number integer not null, -- >> number of this >> component -- within its container ... component_function_id >> integer default 0 references Component_Function, unique >> (container_id, component_kind_id, component_number) ); > > A primary key is always unique. So putting the primary key as > the first element of a UNIQUE index accomplishes no good. But > it does make the database larger and it makes it run slowly > by given the database engine more things to have to keep > up-to-date. >
Richard, I think you misread my SQL. The primary key is component_id, the first component of the unique constraint is container_id. These are different fields. The unique constraint is there to enforce a real application constraint. >> >> --index used to speed container lookups >> create index Component_Container_Index on Component (container_id); > > An index that is on the primary key is redundant. This is especially > true of an INTEGER PRIMARY KEY in SQLite. The index will never be > used to speed up a search. (If it where used, it would only slow the > search down.) But CPU cycles and disk space are consumed maintaining > the index all the same. So an index on a primary key is a net loss. > Same thing here. The index is on the field container_id which is NOT the primary key. In fact this index is very useful for fast container based lookups as below and in other places in my application. >> >> and I populate the temp table with the following SQL statement; >> >> insert into Component_Count >> select sub.id, sub_count, peer_count >> from >> (select pid as id, count(nid) as sub_count >> from >> (select p.component_id as pid, n.component_id as nid >> from Component p left outer join Component n on >> p.component_id = n.container_id) >> group by pid) as sub >> join >> (select nid as id, count(pid) as peer_count >> from >> (select n.component_id as nid, p.component_id as pid >> from Component n left outer join Component p >> on (n.component_id != p.component_id >> and n.container_id = p.container_id >> and n.component_kind_id = p.component_kind_id)) >> group by id) as peer >> using (id); >> > > Joining two unindexed tables (and the result of a subquery is not > going to have an index) will always be an O(N**2) operation. You can > change it into an O(NlogN) operation by storing the results of > subqueries in separate temporary tables, creating indices on those > temporary tables, then doing the join on the temporary tables. Thanks for the hint about creating indexes on the temp tables. I had discovered that I could create separate temp tables with the required counts using the following SQL statements. These statements execute in seconds (or less, its hard to tell). create temp table container_count as select container_id as component_id, count(*) as sub_count from Component group by container_id; create temp table container_kind_count as select container_id, component_kind_id, count(*) as peer_count from Component group by container_id, component_kind_id; But a join to combine the component ids with the sub component counts like this; select component.component_id, sub_count from component left outer join container_count using(component_id); still took a very long time. By issuing a separate create statement for the container_count table, one with the component id declared as a integer primary key (which effectively creates an index on that table), and then filling the table with an "insert as select..." I got an equivalent temp table with an index. Since the component table already has an index on component_id (it's also an integer primary key), my join statement now executes in about a second as well. I now do this; create temp table container_count ( component_id integer primary key, sub_count integer ); insert into container_count select container_id, count(*) as sub_count from Component group by container_id; select component.component_id, sub_count from component left outer join container_count using(component_id); which executes in a second or two for the same 30,000 record table. Now all I need to do is combine this table with the peer counts. SQLite rocks!