Re: [sqlite] Improving performance of GROUP BY

Nico Williams Tue, 29 May 2012 15:04:13 -0700

On Fri, May 25, 2012 at 1:38 PM, Udi Karni <uka...@gmail.com> wrote:
> I am running the following query -
>
> CREATE TABLE XYZ AS
> SELECT ID, MIN (DATE)
> FROM SOURCE-TABLE
> WHERE CRITERIA
> GROUP BY ID ;
>
> SOURCE-TABLE has 600 million rows, 2 million meet the WHERE criteria, and
> get grouped to 100,000 distinct IDs. There are no indexes as there is too
> much variety in the WHERE clauses and index creation on this table takes
> many hours. Temp_store=memory. When executed - the source table is read
> very slowly, and runtime is 1/2 hour.
>
> I split this SQL into 2 steps - the first just extracting - the second just
> grouping. The extract read the table extremely fast and finished in 3
> minutes. The GROUP BY ran in a few seconds. So a 10X improvement - which is
> pretty incredible speed.


Meaning?  Something like this:

CREATE TABLE FOO AS SELECT ID, DATE FROM SOURCE_TABLE;
CREATE TABLE BAR AS SELECT ID, MIN(DATE) FROM FOO GROUP BY ID;

?

Are you sure that there are no indexes?  Maybe there are indexes
implied by UNIQUE constraints and maybe SQLite3 is incorrectly
deciding to scan the index instead of the table?

My guess is that you have a UNIQUE constraint on ID in SOURCE_TABLE
and SQLite3 is scanning the index and then fetching the DATE from the
table.  But this makes no sense because then it makes no sense to
select the min() of DATE as there would be a single row per-ID.  My
guess has to be wrong.  Maybe there's a UNIQUE constraint on several
columns including ID?  That would not be incompatible with the
semantics of your query.

Can you post EXPLAIN QUERY PLAN output for your statements?  And/or your schema.

Nico
--
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Improving performance of GROUP BY

Reply via email to