Simon Riggs <[EMAIL PROTECTED]> writes:
> Any holes in that thinking?
Only that it's about five times more complicated than is currently known
to be necessary ;-). How about we just implement the dynamic spill to
disk first, and not bother with the other stuff until we see problems in
the field?
On Thu, 2006-01-19 at 18:38 -0500, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > This seems to lead to a super-geometric progression in the number of
> > files required,
>
> But we double the number of batches at each step, so there are going to
> be at most 20 or so levels, and th
On Tue, 2006-01-17 at 21:43 +, Simon Riggs wrote:
> On Tue, 2006-01-17 at 09:52 -0500, Tom Lane wrote:
> > I was thinking along the lines of having multiple temp files per hash
> > bucket. If you have a tuple that needs to migrate from bucket M to
> > bucket N, you know that it arrived before
Simon Riggs <[EMAIL PROTECTED]> writes:
> This seems to lead to a super-geometric progression in the number of
> files required,
But we double the number of batches at each step, so there are going to
be at most 20 or so levels, and that's only assuming a *horridly* wrong
initial guess by the plan
On Tue, 2006-01-17 at 21:43 +, Simon Riggs wrote:
> OK My interest was in expanding the role of HashAgg, which as Rod
> says can be used to avoid the sort, so the overlap between those ideas
> was low anyway.
Am I right in thinking that HashAgg would almost always be quicker than
SortAgg,
On Tue, 2006-01-17 at 14:41 -0500, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Mon, 2006-01-16 at 12:36 -0500, Tom Lane wrote:
> >> The tricky part is to preserve the existing guarantee that tuples are
> >> merged into their aggregate in arrival order.
>
> > You almost had me
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Mon, 2006-01-16 at 12:36 -0500, Tom Lane wrote:
>> The tricky part is to preserve the existing guarantee that tuples are
>> merged into their aggregate in arrival order.
> You almost had me there... but there isn't any "arrival order".
The fact that it
On Mon, 2006-01-16 at 12:36 -0500, Tom Lane wrote:
> The tricky part is to preserve the existing guarantee that tuples are
> merged into their aggregate in arrival order. (This does not matter for
> the standard aggregates but it definitely does for custom aggregates,
> and there will be unhappy
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Mon, 2006-01-16 at 20:02 -0500, Tom Lane wrote:
>> But our idea of the number of batches needed can change during that
>> process, resulting in some inner tuples being initially assigned to the
>> wrong temp file. This would also be true for hashagg.
>
On Mon, 2006-01-16 at 20:02 -0500, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Sure hash table is dynamic, but we read all inner rows to create the
> > hash table (nodeHash) before we get the outer rows (nodeHJ).
>
> But our idea of the number of batches needed can change during
Greg Stark <[EMAIL PROTECTED]> writes:
> For a hash aggregate would it be possible to rescan the original table
> instead of spilling to temporary files?
Sure, but the possible performance gain is finite and the possible
performance loss is not. The "original table" could be an extremely
expensiv
Tom Lane <[EMAIL PROTECTED]> writes:
> > Why would we continue to dynamically build the hash table after the
> > start of the outer scan?
>
> The number of tuples written to a temp file might exceed what we want to
> hold in memory; we won't detect this until the batch is read back in,
> and in
Simon Riggs <[EMAIL PROTECTED]> writes:
> Sure hash table is dynamic, but we read all inner rows to create the
> hash table (nodeHash) before we get the outer rows (nodeHJ).
But our idea of the number of batches needed can change during that
process, resulting in some inner tuples being initially
On Mon, 2006-01-16 at 14:43 -0500, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > For HJ we write each outer tuple to its own file-per-batch in the order
> > they arrive. Reading them back in preserves the original ordering. So
> > yes, caution required, but I see no difficulty, just
Simon Riggs <[EMAIL PROTECTED]> writes:
> For HJ we write each outer tuple to its own file-per-batch in the order
> they arrive. Reading them back in preserves the original ordering. So
> yes, caution required, but I see no difficulty, just reworking the HJ
> code (nodeHashjoin and nodeHash). What
On Mon, 2006-01-16 at 12:36 -0500, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Mon, 2006-01-16 at 00:07 -0500, Rod Taylor wrote:
> >> A couple of days ago I found myself wanting to aggregate 3 Billion
> >> tuples down to 100 Million tuples based on an integer key with six
> >>
On Mon, 2006-01-16 at 09:42 -0500, Rod Taylor wrote:
> On Mon, 2006-01-16 at 08:32 +, Simon Riggs wrote:
> > On Mon, 2006-01-16 at 00:07 -0500, Rod Taylor wrote:
> > >
> > A question: Are the rows in your 3 B row table clumped together based
> > upon the 100M row key? (or *mostly* so) We might
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Mon, 2006-01-16 at 00:07 -0500, Rod Taylor wrote:
>> A couple of days ago I found myself wanting to aggregate 3 Billion
>> tuples down to 100 Million tuples based on an integer key with six
>> integer values -- six sum()'s.
> There is already hash table
On Mon, 2006-01-16 at 08:32 +, Simon Riggs wrote:
> On Mon, 2006-01-16 at 00:07 -0500, Rod Taylor wrote:
> > A couple of days ago I found myself wanting to aggregate 3 Billion
> > tuples down to 100 Million tuples based on an integer key with six
> > integer values -- six sum()'s.
> >
> > Post
On Mon, 2006-01-16 at 00:07 -0500, Rod Taylor wrote:
> A couple of days ago I found myself wanting to aggregate 3 Billion
> tuples down to 100 Million tuples based on an integer key with six
> integer values -- six sum()'s.
>
> PostgreSQL ran out of memory with its Hash Aggregator and doing an old
A couple of days ago I found myself wanting to aggregate 3 Billion
tuples down to 100 Million tuples based on an integer key with six
integer values -- six sum()'s.
PostgreSQL ran out of memory with its Hash Aggregator and doing an old
style Sort & Sum took a fair amount of time to complete (cance
21 matches
Mail list logo