It certainly seems to improve things. I ran multiple tests because over time seems like portions of the data get cached and there is a difference between an initial run and subsequent runs. I almost wish simple table-scans would always go exclusively to disk - I am not sure the caching always helps.
In any event - breaking the SQL up in 2 is still at least twice as better. Where SQlite seems to run superfast all the time - is :memory: tables - if you have the RAM. Maybe that's the real direction with large DBs. A giant RAMDisk ? On Fri, May 25, 2012 at 12:10 PM, Simon Slavin <[email protected]> wrote: > > On 25 May 2012, at 7:38pm, Udi Karni <[email protected]> wrote: > > > I am running the following query - > > > > CREATE TABLE XYZ AS > > SELECT ID, MIN (DATE) > > FROM SOURCE-TABLE > > WHERE CRITERIA > > GROUP BY ID ; > > > > SOURCE-TABLE has 600 million rows, 2 million meet the WHERE criteria, and > > get grouped to 100,000 distinct IDs. There are no indexes as there is too > > much variety in the WHERE clauses and index creation on this table takes > > many hours. Temp_store=memory. When executed - the source table is read > > very slowly, and runtime is 1/2 hour. > > > > I split this SQL into 2 steps - the first just extracting - the second > just > > grouping. The extract read the table extremely fast and finished in 3 > > minutes. The GROUP BY ran in a few seconds. So a 10X improvement - which > is > > pretty incredible speed. > > Just for testing purposes, change MIN(DATE) to DATE and check the timings. > > Simon. > _______________________________________________ > sqlite-users mailing list > [email protected] > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

