--- [EMAIL PROTECTED] wrote:
> > Joe Wilson <[EMAIL PROTECTED]> wrote:
> > > Anyone have any ideas how to speed up GROUP BY on huge views
> > > in recent versions of SQLite?
> > >
> > >   http://www.sqlite.org/cvstrac/tktview?tn=1809
> > >
> > > The older versions of SQLite (prior to SQLite 3.2.6) used to
> > > perform GROUP BY operations in the main table loop, grouping
> > > rows as it went along. But the new version appears to create
> > > a huge temp table of results, sort them, and only then determine
> > > the groups. This takes up a massive amount of temp store.
> > > The old GROUP BY algorithm excels in the case where there are
> > > a lot of groupable rows in the dataset; i.e., the typical case.
> > >
> >
> > The old algorithm required enough memory to hold the
> > entire results set.  The new algorithm uses bounded
> > memory, which is a very important advantage if you
> > have a GROUP BY with a large result set.

Temp store certainly is not bounded.

There's a huge increase in temp store due to the new algorithm 
used in GROUP BY. In one simple case temp store memory use is 
190 times larger - from 1.2M to 230M.  I did not mention the 
cases that do not run at all with the latest SQLite due to lack 
of disk space. The same cases run in under a minute in SQLite 
3.2.1 using < 100 megs of RAM and no temp store.

As you know, the old GROUP BY algorithm did not hold the 
entire intermediate result set in memory - just the unique rows 
and aggregated intermediate results. The new algorithm no longer 
employs this extremely efficient time/space optimization.

> > The new algorithm will run much faster if you have
> > an index on the GROUP BY term(s).

...but if you cannot use an index because the items being 
SELECTed/GROUPed are expressions, you pay a very large penalty
with the new GROUP BY algorithm - the query can run 100 times 
slower or more if your data set is in the tens of millions of 
rows.

The old GROUP BY algorithm was extremely efficient whether or 
not the GROUP BY terms were indexable or not.  

Can't the old 3.2.5 algorithm for GROUP BY be reinstated but with 
its temporary intermediate results stored in temp store instead of 
memory? That way you get the best features of both algorithms.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to