We have a rude hack of temping hashed aggs to disk to deal with the
case where there is not enough memory. I don't think that's an ideal
solution, but it certainly has the code to dump to file. I can post
the patch later in the day ..
(This is some code for our undergrad db class assignment. I
I'm curious if anyone has ever looked into adding OLAP functionality (per
the SQL99 specs) into PostGreSQL. I don't actually own the proper SQL99
specifications, and since the newer sql2003 ones are coming out I don't know
which to purchase. Could someone point me in the right direction? I've
Robert Bedell [EMAIL PROTECTED] writes:
I'm curious if anyone has ever looked into adding OLAP functionality (per
the SQL99 specs) into PostGreSQL.
There was a fairly crude CUBE implementation submitted (and rejected) a
few months ago, but there's not been any work I thought had a chance of
Robert Bedell wrote:
I'm curious if anyone has ever looked into adding OLAP functionality (per
the SQL99 specs) into PostGreSQL. I don't actually own the proper SQL99
specifications, and since the newer sql2003 ones are coming out I don't know
which to purchase. Could someone point me in the
More specifically I would like to add grouping sets, and the CUBE and
ROLLUP
operators, into postgresql. Since modifying such the GROUP BY operation
would necessitate changing the query structure, wouldn't that affect the
query rewrites and genetic optimizer?
I don't think either the
Tom Lane kirjutas N, 18.12.2003 kell 00:27:
Robert Bedell [EMAIL PROTECTED] writes:
I'm curious if anyone has ever looked into adding OLAP functionality (per
the SQL99 specs) into PostGreSQL.
As a first project one could think of implementing NULLS FIRST/LAST
(from 4.14.9) for all ORDER BY
Robert Bedell kirjutas N, 18.12.2003 kell 01:02:
These are not simple projects, I know.
Might be a tad ambitious for your first venture into backend hacking...
I agree completely. I'm not purporting to jump in quite that quickly, but
it is something I would like to see added, and am
I guess that by adding hash aggregates Tom solved most problems of
adding ROLLUP, CUBE and GROUPING SETS.
OTOH, I'm not sure if hash aggregates can already spill to disk if not
enough memory is available for keeping them all. If not, then adding
this capability would be great push towards
Robert Bedell kirjutas N, 18.12.2003 kell 01:55:
I guess that by adding hash aggregates Tom solved most problems of
adding ROLLUP, CUBE and GROUPING SETS.
OTOH, I'm not sure if hash aggregates can already spill to disk if not
enough memory is available for keeping them all. If not, then
it seems that the longest GROUPING SET and all its left-continuous
subsets could be collected from the sorted scan and the rest from hash
aggregates.
GROUPING SET () will always need a hash ;)
To optimise any further would require use of statistics data, and is
probably not a good idea
For pure ROLLUP one could shortcut the split-into-groups and
put-together-again process, as ROLLUP is already doable from single
sorted scan.
Actually as long as the grouping sets are all left-continuous of the
longest
grouping set it's doable from a single sorted scan. If done with
Hannu Krosing [EMAIL PROTECTED] writes:
OTOH, I'm not sure if hash aggregates can already spill to disk if not
enough memory is available for keeping them all.
They do not, which is something it'd be good to fix, since if the
planner drastically underestimates the number of groups, you could
Hannu Krosing [EMAIL PROTECTED] writes:
OTOH, I'm not sure if hash aggregates can already spill to disk if not
enough memory is available for keeping them all.
They do not, which is something it'd be good to fix, since if the
planner drastically underestimates the number of groups, you
Robert Bedell [EMAIL PROTECTED] writes:
1) When does the optimizer set the nodeAgg plan to HASHED?
See grouping_planner() in src/backend/optimizer/plan/planner.c
particularly the logic around use_hashed_grouping.
2) What mechanism would be best to use for storing the data on disk? I know
Thanks for the pointers!
The documentation points to the
'sort_mem' parameter for this, but the code doesn't look to actually
implement that yet.
Well, yeah, that's sort of exactly the point ... it's considered during
planning but the executor code has no fallback if the planner
Also ...
Robert Bedell [EMAIL PROTECTED] writes:
... I think that LookupTupleHashEntry
is the best place to put the code to spill over unto disk, since that is
used only by the Group, Agg, and Subplan executor nodes.
That's the right area to be looking at, but I don't think you can expect
to
That's the right area to be looking at, but I don't think you can expect
to do a decent job with localized hacking in LookupTupleHashEntry. That
function's API is predicated on the assumption that you have random
access to any entry in the hash table --- which stops being true as soon
as you
17 matches
Mail list logo