D. Richard Hipp wrote:
> Dennis Cote wrote:
>> Hi All,
>>
>> I have a situation where populating data in a temp table takes a
>> long time, about 8 minutes, and I am wondering if anyone can offer
>> any suggestions for ways to speed it up.
>>
>> I have the permanent table and associated index, shown below, which
>> holds tree structured component data. Each node is related to its
>> parent by the container_id field.
>>
>> create table Component (
>>     component_id            integer primary key,
>>     component_kind_id       integer not null references
>>     Component_Kind, component_number        integer not null,   --
>>                                                 number of this
>>     component --  within its container ... component_function_id
>>     integer default 0 references Component_Function, unique
>> (container_id, component_kind_id, component_number) );
>
> A primary key is always unique.  So putting the primary key as
> the first element of a UNIQUE index accomplishes no good.  But
> it does make the database larger and it makes it run slowly
> by given the database engine more things to have to keep
> up-to-date.
>

Richard,

I think you misread my SQL. The primary key is component_id, the first
component of the unique constraint is container_id. These are different
fields. The unique constraint is there to enforce a real application
constraint.

>>
>> --index used to speed container lookups
>> create index Component_Container_Index on Component (container_id);
>
> An index that is on the primary key is redundant.  This is especially
> true of an INTEGER PRIMARY KEY in SQLite.  The index will never be
> used to speed up a search.  (If it where used, it would only slow the
> search down.)  But CPU cycles and disk space are consumed maintaining
> the index all the same.  So an index on a primary key is a net loss.
>

Same thing here. The index is on the field container_id which is NOT the
primary key. In fact this index is very useful for fast container based
lookups as below and in other places in my application.

>>
>> and I populate the temp table with the following SQL statement;
>>
>> insert into Component_Count
>>     select sub.id, sub_count, peer_count
>>     from
>>         (select pid as id, count(nid) as sub_count
>>         from
>>             (select p.component_id as pid, n.component_id as nid
>>             from Component p left outer join Component n on
>> p.component_id = n.container_id)
>>         group by pid) as sub
>>     join
>>         (select nid as id, count(pid) as peer_count
>>         from
>>             (select n.component_id as nid, p.component_id as pid
>>             from Component n left outer join Component p
>>                 on (n.component_id != p.component_id
>>                 and n.container_id = p.container_id
>>                 and n.component_kind_id = p.component_kind_id))
>>         group by id) as peer
>>     using (id);
>>
>
> Joining two unindexed tables (and the result of a subquery is not
> going to have an index) will always be an O(N**2) operation.  You can
> change it into an O(NlogN) operation by storing the results of
> subqueries in separate temporary tables, creating indices on those
> temporary tables, then doing the join on the temporary tables.

Thanks for the hint about creating indexes on the temp tables.

I had discovered that I could create separate temp tables with the required
counts using the following SQL statements. These statements execute in
seconds (or less, its hard to tell).

create temp table container_count as
    select container_id as component_id, count(*) as sub_count
    from Component group by container_id;

create temp table container_kind_count as
    select container_id, component_kind_id, count(*) as peer_count
    from Component group by container_id, component_kind_id;

But a join to combine the component ids with the sub component counts like
this;

select component.component_id, sub_count from
       component left outer join container_count using(component_id);

still took a very long time.

By issuing a separate create statement for the container_count table, one
with the component id declared as a integer primary key (which effectively
creates an index on that table), and then filling the table with an "insert
as select..." I got an equivalent temp table with an index. Since the
component table already has an index on component_id (it's also an integer
primary key), my join statement now executes in about a second as well. I
now do this;

create temp table container_count (
    component_id    integer primary key,
    sub_count       integer
);

insert into container_count
    select container_id, count(*) as sub_count from Component group by
container_id;

select component.component_id, sub_count from
       component left outer join container_count using(component_id);

which executes in a second or two for the same 30,000 record table.

Now all I need to do is combine this table with the peer counts.

SQLite rocks!

Reply via email to