After posting my question, I found the discussion of how aggregate operations are performed in the VDBE Tutorial; that implies that memory usage will correspond with the number of unique keys encountered by the query, but I appreciate having it stated explicitly.
How difficult would it be, in concept, to change the storage of the hash buckets from in-memory to on-disk? The VDBE Tutorial makes it sound like it would be a matter of changing the AggFocus, AggNext, and maybe a few other operations to store/retrieve buckets from disk before operating on them in memory. How dramatically am I oversimplifying this? :) Thanks -Tom > -----Original Message----- > From: D. Richard Hipp [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 24, 2005 10:32 AM > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] Memory usage for queries containing a > GROUP BY clause > > On Thu, 2005-03-24 at 10:09 -0500, Thomas Briggs wrote: > > > I have a 1GB database containing a single table. Simple queries > > against this table (SELECT COUNT(*), etc.) run without > using more than a > > few MBs of memory; the amount used seems to correspond > directly with the > > size of the page cache, as I expected. If I execute an > aggregate query > > that contains a GROUP BY clause, however, the memory usage > seems to jump > > quite a bit - the memory usage seems to correlate with the number of > > columns in the GROUP BY. Grouping by three columns using a couple > > hundred megs of memory; grouping by eight columns uses more > than 1.3GB!. > > > > The entire results set of an aggregate query is held in > memory. The only way to limit the memory usage of an > aggregate query is to construct the query in such a way > that the number of rows in the result set is limited. > If you are using 1.3GB of memory to do an aggregate query, > my guess is that there are a very large number of rows in > the result set. > > It is, in theory, possible to reduce the memory requirements > for an aggregate query if the GROUP BY clause specifies columns > that are indexed. But SQLite does not currently implement > that optimization. You can implement the optimization manually, > though, by turning the GROUP BY clause into an ORDER BY clause, > dropping the aggregate functions, then doing the aggregation > manually as the rows come out of the database in order. > > -- > D. Richard Hipp <[EMAIL PROTECTED]> > >