Re: [sqlite] Improving performance of GROUP BY

Udi Karni Sat, 26 May 2012 10:25:22 -0700

It certainly seems to improve things. I ran multiple tests because over
time seems like portions of the data get cached and there is a difference
between an initial run and subsequent runs. I almost wish simple
table-scans would always go exclusively to disk - I am not sure the caching
always helps.


In any event - breaking the SQL up in 2 is still at least twice as better.

Where SQlite seems to run superfast all the time - is :memory: tables - if
you have the RAM. Maybe that's the real direction with large DBs. A giant
RAMDisk ?
On Fri, May 25, 2012 at 12:10 PM, Simon Slavin <[email protected]> wrote:

>
> On 25 May 2012, at 7:38pm, Udi Karni <[email protected]> wrote:
>
> > I am running the following query -
> >
> > CREATE TABLE XYZ AS
> > SELECT ID, MIN (DATE)
> > FROM SOURCE-TABLE
> > WHERE CRITERIA
> > GROUP BY ID ;
> >
> > SOURCE-TABLE has 600 million rows, 2 million meet the WHERE criteria, and
> > get grouped to 100,000 distinct IDs. There are no indexes as there is too
> > much variety in the WHERE clauses and index creation on this table takes
> > many hours. Temp_store=memory. When executed - the source table is read
> > very slowly, and runtime is 1/2 hour.
> >
> > I split this SQL into 2 steps - the first just extracting - the second
> just
> > grouping. The extract read the table extremely fast and finished in 3
> > minutes. The GROUP BY ran in a few seconds. So a 10X improvement - which
> is
> > pretty incredible speed.
>
> Just for testing purposes, change MIN(DATE) to DATE and check the timings.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Improving performance of GROUP BY

Reply via email to