D. Richard Hipp wrote:
> Dennis Cote wrote:
>> Hi All,
>>
>> I have a situation where populating data in a temp table takes a
>> long time, about 8 minutes, and I am wondering if anyone can offer
>> any suggestions for ways to speed it up.
>>
>> I have the permanent table and associated index, shown below, which
>> holds tree structured component data. Each node is related to its
>> parent by the container_id field.
>>
>> create table Component (
>> component_id integer primary key,
>> component_kind_id integer not null references
>> Component_Kind, component_number integer not null, --
>> number of this
>> component -- within its container ... component_function_id
>> integer default 0 references Component_Function, unique
>> (container_id, component_kind_id, component_number) );
>
> A primary key is always unique. So putting the primary key as
> the first element of a UNIQUE index accomplishes no good. But
> it does make the database larger and it makes it run slowly
> by given the database engine more things to have to keep
> up-to-date.
>
Richard,
I think you misread my SQL. The primary key is component_id, the first
component of the unique constraint is container_id. These are different
fields. The unique constraint is there to enforce a real application
constraint.
>>
>> --index used to speed container lookups
>> create index Component_Container_Index on Component (container_id);
>
> An index that is on the primary key is redundant. This is especially
> true of an INTEGER PRIMARY KEY in SQLite. The index will never be
> used to speed up a search. (If it where used, it would only slow the
> search down.) But CPU cycles and disk space are consumed maintaining
> the index all the same. So an index on a primary key is a net loss.
>
Same thing here. The index is on the field container_id which is NOT the
primary key. In fact this index is very useful for fast container based
lookups as below and in other places in my application.
>>
>> and I populate the temp table with the following SQL statement;
>>
>> insert into Component_Count
>> select sub.id, sub_count, peer_count
>> from
>> (select pid as id, count(nid) as sub_count
>> from
>> (select p.component_id as pid, n.component_id as nid
>> from Component p left outer join Component n on
>> p.component_id = n.container_id)
>> group by pid) as sub
>> join
>> (select nid as id, count(pid) as peer_count
>> from
>> (select n.component_id as nid, p.component_id as pid
>> from Component n left outer join Component p
>> on (n.component_id != p.component_id
>> and n.container_id = p.container_id
>> and n.component_kind_id = p.component_kind_id))
>> group by id) as peer
>> using (id);
>>
>
> Joining two unindexed tables (and the result of a subquery is not
> going to have an index) will always be an O(N**2) operation. You can
> change it into an O(NlogN) operation by storing the results of
> subqueries in separate temporary tables, creating indices on those
> temporary tables, then doing the join on the temporary tables.
Thanks for the hint about creating indexes on the temp tables.
I had discovered that I could create separate temp tables with the required
counts using the following SQL statements. These statements execute in
seconds (or less, its hard to tell).
create temp table container_count as
select container_id as component_id, count(*) as sub_count
from Component group by container_id;
create temp table container_kind_count as
select container_id, component_kind_id, count(*) as peer_count
from Component group by container_id, component_kind_id;
But a join to combine the component ids with the sub component counts like
this;
select component.component_id, sub_count from
component left outer join container_count using(component_id);
still took a very long time.
By issuing a separate create statement for the container_count table, one
with the component id declared as a integer primary key (which effectively
creates an index on that table), and then filling the table with an "insert
as select..." I got an equivalent temp table with an index. Since the
component table already has an index on component_id (it's also an integer
primary key), my join statement now executes in about a second as well. I
now do this;
create temp table container_count (
component_id integer primary key,
sub_count integer
);
insert into container_count
select container_id, count(*) as sub_count from Component group by
container_id;
select component.component_id, sub_count from
component left outer join container_count using(component_id);
which executes in a second or two for the same 30,000 record table.
Now all I need to do is combine this table with the peer counts.
SQLite rocks!