Default setting for enable_hashagg_disk

2020-04-07 Thread Jeff Davis
This is just a placeholder thread for an open item that I'm adding to the Open Items list. We can make a decision later. Now that we have Disk-based Hash Aggregation, there are a lot more situations where the planner can choose HashAgg. The enable_hashagg_disk GUC, if set to true, chooses HashAgg

Re: Default setting for enable_hashagg_disk

2020-04-07 Thread Justin Pryzby
On Tue, Apr 07, 2020 at 11:20:46AM -0700, Jeff Davis wrote: > The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on > costing. If false, it only generates a HashAgg path if it thinks it will fit > in work_mem, similar to the old behavior (though it wlil now spill to disk if > the pl

Re: Default setting for enable_hashagg_disk

2020-04-09 Thread Tomas Vondra
On Tue, Apr 07, 2020 at 05:39:01PM -0500, Justin Pryzby wrote: On Tue, Apr 07, 2020 at 11:20:46AM -0700, Jeff Davis wrote: The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on costing. If false, it only generates a HashAgg path if it thinks it will fit in work_mem, similar to th

Re: Default setting for enable_hashagg_disk

2020-04-09 Thread Justin Pryzby
On Thu, Apr 09, 2020 at 01:48:55PM +0200, Tomas Vondra wrote: > On Tue, Apr 07, 2020 at 05:39:01PM -0500, Justin Pryzby wrote: > > On Tue, Apr 07, 2020 at 11:20:46AM -0700, Jeff Davis wrote: > > > The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on > > > costing. If false, it only

Re: Default setting for enable_hashagg_disk

2020-04-09 Thread Jeff Davis
On Thu, 2020-04-09 at 12:24 -0500, Justin Pryzby wrote: > Also.. there's no such thing as enable_groupagg? Unless I've been > missing out > on something. I thought about adding that, and went so far as to make a patch. But it didn't seem right to me -- the grouping isn't what takes the time, it's

Re: Default setting for enable_hashagg_disk

2020-04-09 Thread Robert Haas
On Thu, Apr 9, 2020 at 7:49 AM Tomas Vondra wrote: > It it really any different from our enable_* GUCs? Even if you do e.g. > enable_sort=off, we may still do a sort. Same for enable_groupagg etc. I think it's actually pretty different. All of the other enable_* GUCs disable an entire type of pla

Re: Default setting for enable_hashagg_disk

2020-04-09 Thread Jeff Davis
On Thu, 2020-04-09 at 15:26 -0400, Robert Haas wrote: > I think it's actually pretty different. All of the other enable_* > GUCs > disable an entire type of plan node, except for cases where that > would > otherwise result in planning failure. This just disables a portion of > the planning logic fo

Re: Default setting for enable_hashagg_disk

2020-06-22 Thread Jeff Davis
On Mon, 2020-06-22 at 10:52 -0400, Robert Haas wrote: > So I feel like the really important thing here is to fix the cases > that don't come out well with default settings. ...with the caveat that perfection is not something to expect from our planner. > If we can't do that, > then the feature i

Re: Default setting for enable_hashagg_disk

2020-06-22 Thread Jeff Davis
On Mon, 2020-06-22 at 11:17 -0400, Robert Haas wrote: > I mean, that's fine, but I am trying to make a more general point > about priorities. Getting the GUCs right is a lot less important than > getting the feature right. What about the feature you are worried that we're getting wrong? Regards,

Re: Default setting for enable_hashagg_disk

2020-06-22 Thread Robert Haas
On Mon, Jun 22, 2020 at 1:30 PM Jeff Davis wrote: > On Mon, 2020-06-22 at 10:52 -0400, Robert Haas wrote: > > So I feel like the really important thing here is to fix the cases > > that don't come out well with default settings. > > ...with the caveat that perfection is not something to expect fro

Re: Default setting for enable_hashagg_disk

2020-06-22 Thread Jeff Davis
On Mon, 2020-06-22 at 15:28 -0400, Robert Haas wrote: > The weirdness is the problem here, at least for me. Generally, I > don't > like GUCs of the form give_me_the_old_strange_behavior=true I agree with all of that in general. > I don't think it necessarily implies that either. I do however have

Re: Default setting for enable_hashagg_disk

2020-06-23 Thread David Rowley
On Tue, 23 Jun 2020 at 08:24, Jeff Davis wrote: > Another way of looking at it is that the weird behavior is already > there in v12, so there are already users relying on this weird behavior > as a crutch for some other planner mistake. The question is whether we > want to: > > (a) take the weird

Re: Default setting for enable_hashagg_disk

2020-06-23 Thread Justin Pryzby
On Wed, Jun 24, 2020 at 02:11:57PM +1200, David Rowley wrote: > On Tue, 23 Jun 2020 at 08:24, Jeff Davis wrote: > > Another way of looking at it is that the weird behavior is already > > there in v12, so there are already users relying on this weird behavior > > as a crutch for some other planner

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Bruce Momjian
On Wed, Jun 24, 2020 at 02:11:57PM +1200, David Rowley wrote: > On Tue, 23 Jun 2020 at 08:24, Jeff Davis wrote: > > Another way of looking at it is that the weird behavior is already > > there in v12, so there are already users relying on this weird behavior > > as a crutch for some other planner

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread David Rowley
On Wed, 24 Jun 2020 at 21:06, Bruce Momjian wrote: > I > don't remember anyone complaining about spills to disk during merge > join, so I am unclear why we would need a such control for hash join. Hash aggregate, you mean? The reason is that upgrading to PG13 can cause a performance regression

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Justin Pryzby
On Wed, Jun 24, 2020 at 05:06:28AM -0400, Bruce Momjian wrote: > On Wed, Jun 24, 2020 at 02:11:57PM +1200, David Rowley wrote: > > On Tue, 23 Jun 2020 at 08:24, Jeff Davis wrote: > > > Another way of looking at it is that the weird behavior is already > > > there in v12, so there are already users

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Bruce Momjian
On Wed, Jun 24, 2020 at 07:38:43AM -0500, Justin Pryzby wrote: > On Wed, Jun 24, 2020 at 05:06:28AM -0400, Bruce Momjian wrote: > > It would seem merge join has almost the same complexities as the new > > hash join code, since it can spill to disk doing sorts for merge joins, > > and adjusting work

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Bruce Momjian
On Thu, Jun 25, 2020 at 12:24:29AM +1200, David Rowley wrote: > On Wed, 24 Jun 2020 at 21:06, Bruce Momjian wrote: > > I > > don't remember anyone complaining about spills to disk during merge > > join, so I am unclear why we would need a such control for hash join. > > Hash aggregate, you mean?

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Tom Lane
Justin Pryzby writes: > On Wed, Jun 24, 2020 at 05:06:28AM -0400, Bruce Momjian wrote: >> It would seem merge join has almost the same complexities as the new >> hash join code, since it can spill to disk doing sorts for merge joins, >> and adjusting work_mem is the only way to control that spill

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Tomas Vondra
On Wed, Jun 24, 2020 at 01:29:56PM -0400, Tom Lane wrote: Justin Pryzby writes: On Wed, Jun 24, 2020 at 05:06:28AM -0400, Bruce Momjian wrote: It would seem merge join has almost the same complexities as the new hash join code, since it can spill to disk doing sorts for merge joins, and adjust

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Tom Lane
Tomas Vondra writes: > On Wed, Jun 24, 2020 at 01:29:56PM -0400, Tom Lane wrote: >> If we feel we need something to let people have the v12 behavior >> back, let's have >> (1) enable_hashagg on/off --- controls planner, same as it ever was >> (2) enable_hashagg_spill on/off --- controls executor b

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Andres Freund
Hi, On 2020-06-24 14:11:57 +1200, David Rowley wrote: > 1. Statistics underestimation can cause hashagg to be selected. The > executor will spill to disk in PG13. Users may find performance > suffers as previously the query may have just overshot work_mem > without causing any OOM issues. Their I

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Andres Freund
Hi, On 2020-06-24 13:12:03 -0400, Bruce Momjian wrote: > Well, my point is that merge join works that way, and no one has needed > a knob to avoid mergejoin if it is going to spill to disk. If they are > adjusting work_mem to prevent spill of merge join, they can do the same > for hash agg. We j

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Robert Haas
On Wed, Jun 24, 2020 at 3:14 PM Andres Freund wrote: > FWIW, my gut feeling is that we'll end up have to separate the > "execution time" spilling from using plain work mem, because it'll > trigger spilling too often. E.g. if the plan isn't expected to spill, > only spill at 10 x work_mem or someth

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Andres Freund
Hi, On 2020-06-24 14:40:50 -0400, Tom Lane wrote: > Tomas Vondra writes: > > On Wed, Jun 24, 2020 at 01:29:56PM -0400, Tom Lane wrote: > >> If we feel we need something to let people have the v12 behavior > >> back, let's have > >> (1) enable_hashagg on/off --- controls planner, same as it ever w

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Andres Freund
Hi, On 2020-06-24 15:28:47 -0400, Robert Haas wrote: > On Wed, Jun 24, 2020 at 3:14 PM Andres Freund wrote: > > FWIW, my gut feeling is that we'll end up have to separate the > > "execution time" spilling from using plain work mem, because it'll > > trigger spilling too often. E.g. if the plan is

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Tomas Vondra
On Wed, Jun 24, 2020 at 12:36:24PM -0700, Andres Freund wrote: Hi, On 2020-06-24 15:28:47 -0400, Robert Haas wrote: On Wed, Jun 24, 2020 at 3:14 PM Andres Freund wrote: > FWIW, my gut feeling is that we'll end up have to separate the > "execution time" spilling from using plain work mem, becau

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Bruce Momjian
On Wed, Jun 24, 2020 at 11:02:10PM +0200, Tomas Vondra wrote: > > Indeed. And then perhaps we could eventually add some reporting / > > monitoring infrastructure for the cases where plan time and execution > > time memory estimate/usage widely differs. > > > > I wouldn't mind something like that

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Bruce Momjian
On Wed, Jun 24, 2020 at 12:19:00PM -0700, Andres Freund wrote: > Hi, > > On 2020-06-24 13:12:03 -0400, Bruce Momjian wrote: > > Well, my point is that merge join works that way, and no one has needed > > a knob to avoid mergejoin if it is going to spill to disk. If they are > > adjusting work_mem

Re: Default setting for enable_hashagg_disk

2020-06-24 Thread Bruce Momjian
On Wed, Jun 24, 2020 at 07:18:10PM -0400, Bruce Momjian wrote: > On Wed, Jun 24, 2020 at 12:19:00PM -0700, Andres Freund wrote: > > Hi, > > > > On 2020-06-24 13:12:03 -0400, Bruce Momjian wrote: > > > Well, my point is that merge join works that way, and no one has needed > > > a knob to avoid mer

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Robert Haas
On Wed, Jun 24, 2020 at 7:38 PM Bruce Momjian wrote: > I think my main point is that work_mem was not being honored for > hash-agg before, but now that PG 13 can do it, we are again allowing > work_mem not to apply in certain cases. I am wondering if our hard > limit for work_mem is the issue, an

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Jeff Davis
On Wed, 2020-06-24 at 15:28 -0400, Robert Haas wrote: > On Wed, Jun 24, 2020 at 3:14 PM Andres Freund > wrote: > > FWIW, my gut feeling is that we'll end up have to separate the > > "execution time" spilling from using plain work mem, because it'll > > trigger spilling too often. E.g. if the plan

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Jeff Davis
On Wed, 2020-06-24 at 12:14 -0700, Andres Freund wrote: > E.g. if the plan isn't expected to spill, > only spill at 10 x work_mem or something like that. Let's say you have work_mem=32MB and a query that's expected to use 16MB of memory. In reality, it uses 64MB of memory. So you are saying this q

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Andres Freund
On 2020-06-25 09:24:52 -0700, Jeff Davis wrote: > On Wed, 2020-06-24 at 12:14 -0700, Andres Freund wrote: > > E.g. if the plan isn't expected to spill, > > only spill at 10 x work_mem or something like that. > > Let's say you have work_mem=32MB and a query that's expected to use > 16MB of memory.

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Jeff Davis
On Wed, 2020-06-24 at 12:31 -0700, Andres Freund wrote: > nodeAgg.c already treats those separately: > > void > hash_agg_set_limits(double hashentrysize, uint64 input_groups, int > used_bits, > Size *mem_limit, uint64 > *ngroups_limit, >

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Jeff Davis
On Thu, 2020-06-25 at 11:46 -0400, Robert Haas wrote: > Because the first problem is so bad, most people > set the value relatively conservatively and just live with the > performance consequences. But this also means that they have memory > left over most of the time, so the idea of letting a node

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Bruce Momjian
On Thu, Jun 25, 2020 at 11:46:54AM -0400, Robert Haas wrote: > On Wed, Jun 24, 2020 at 7:38 PM Bruce Momjian wrote: > > I think my main point is that work_mem was not being honored for > > hash-agg before, but now that PG 13 can do it, we are again allowing > > work_mem not to apply in certain cas

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Jeff Davis
On Thu, 2020-06-25 at 09:37 -0700, Andres Freund wrote: > > Let's say you have work_mem=32MB and a query that's expected to use > > 16MB of memory. In reality, it uses 64MB of memory. So you are > > saying > > this query would get to use all 64MB of memory, right? > > > > But then you run ANALYZE.

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Robert Haas
On Thu, Jun 25, 2020 at 1:15 PM Jeff Davis wrote: > Unexpected things (meaning underestimates) are not independent. All the > queries are based on the same stats, so if you have a lot of similar > queries, they will all get the same underestimate at once, and all be > surprised when they need to s

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Jeff Davis
On Thu, 2020-06-25 at 13:17 -0400, Bruce Momjian wrote: > Frankly, if it took me this long to get my head around this, I am > unclear how many people will understand this tuning feature enough to > actually use it. The way I think about it is that v13 HashAgg is much more consistent with the way w

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Jeff Davis
On Wed, 2020-06-24 at 13:29 -0400, Tom Lane wrote: > If we feel we need something to let people have the v12 behavior > back, let's have > (1) enable_hashagg on/off --- controls planner, same as it ever was > (2) enable_hashagg_spill on/off --- controls executor by disabling > spill > > But I'm no

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Andres Freund
Hi, On 2020-06-25 10:44:42 -0700, Jeff Davis wrote: > There are only two possible paths: HashAgg and Sort+Group, and we need > to pick one. If the planner expects one to spill, it is likely to > expect the other to spill. If one spills in the executor, then the > other is likely to spill, too. (I'

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Bruce Momjian
On Thu, Jun 25, 2020 at 11:02:30AM -0700, Jeff Davis wrote: > If we have the GUCs there, then at least if someone comes to the > mailing list with a problem, we can offer them a temporary solution, > and have time to try to avoid the problem in a future release (tweaking > estimates, cost model, de

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Bruce Momjian
On Thu, Jun 25, 2020 at 11:10:58AM -0700, Jeff Davis wrote: > On Wed, 2020-06-24 at 13:29 -0400, Tom Lane wrote: > > If we feel we need something to let people have the v12 behavior > > back, let's have > > (1) enable_hashagg on/off --- controls planner, same as it ever was > > (2) enable_hashagg_s

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Bruce Momjian
On Thu, Jun 25, 2020 at 03:24:42PM -0400, Bruce Momjian wrote: > On Thu, Jun 25, 2020 at 11:10:58AM -0700, Jeff Davis wrote: > > On Wed, 2020-06-24 at 13:29 -0400, Tom Lane wrote: > > > If we feel we need something to let people have the v12 behavior > > > back, let's have > > > (1) enable_hashagg

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Andres Freund
Hi, On 2020-06-25 14:25:12 -0400, Bruce Momjian wrote: > I am still trying to get my head around why the spill is going to be so > much work to adjust for hash agg than our other spillable nodes. Aggregates are the classical case used to process large amounts of data. For larger amounts of data s

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Jeff Davis
On Thu, 2020-06-25 at 15:56 -0400, Bruce Momjian wrote: > It is my understanding that spill of sorts is mostly read > sequentially, > while hash reads are random. Is that right? Is that not being > costed > properly? I don't think there's a major problem with the cost model, but it could probabl

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Alvaro Herrera
On 2020-Jun-25, Andres Freund wrote: > > What are people doing for those cases already? Do we have an > > real-world queries that are a problem in PG 13 for this? > > I don't know about real world, but it's pretty easy to come up with > examples. > > query: > SELECT a, array_agg(b) FROM (SELECT

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Andres Freund
Hi, On June 25, 2020 3:44:22 PM PDT, Alvaro Herrera wrote: >On 2020-Jun-25, Andres Freund wrote: > >> > What are people doing for those cases already? Do we have an >> > real-world queries that are a problem in PG 13 for this? >> >> I don't know about real world, but it's pretty easy to come

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Alvaro Herrera
On 2020-Jun-25, Andres Freund wrote: > >My point here is that maybe we don't need to offer a GUC to explicitly > >turn spilling off; it seems sufficient to let users change work_mem so > >that spilling will naturally not occur. Why do we need more? > > That's not really a useful escape hatch, be

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Tomas Vondra
On Thu, Jun 25, 2020 at 02:28:02PM -0700, Jeff Davis wrote: On Thu, 2020-06-25 at 15:56 -0400, Bruce Momjian wrote: It is my understanding that spill of sorts is mostly read sequentially, while hash reads are random. Is that right? Is that not being costed properly? I don't think there's a m

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Tomas Vondra
On Thu, Jun 25, 2020 at 11:16:23AM -0700, Andres Freund wrote: Hi, On 2020-06-25 10:44:42 -0700, Jeff Davis wrote: There are only two possible paths: HashAgg and Sort+Group, and we need to pick one. If the planner expects one to spill, it is likely to expect the other to spill. If one spills in

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Tomas Vondra
On Thu, Jun 25, 2020 at 09:42:33AM -0700, Jeff Davis wrote: On Wed, 2020-06-24 at 12:31 -0700, Andres Freund wrote: nodeAgg.c already treats those separately: void hash_agg_set_limits(double hashentrysize, uint64 input_groups, int used_bits, Size *mem_lim

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Tomas Vondra
On Thu, Jun 25, 2020 at 01:17:56PM -0400, Bruce Momjian wrote: On Thu, Jun 25, 2020 at 11:46:54AM -0400, Robert Haas wrote: On Wed, Jun 24, 2020 at 7:38 PM Bruce Momjian wrote: > I think my main point is that work_mem was not being honored for > hash-agg before, but now that PG 13 can do it, we

Re: Default setting for enable_hashagg_disk

2020-06-25 Thread Bruce Momjian
On Fri, Jun 26, 2020 at 01:53:57AM +0200, Tomas Vondra wrote: > I'm not saying it's not beneficial to use different limits for different > nodes. Some nodes are less sensitive to the size (e.g. sorting often > gets faster with smaller work_mem). But I think we should instead have a > per-session li

Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Tomas Vondra
On Fri, Jun 26, 2020 at 12:02:10AM -0400, Bruce Momjian wrote: On Fri, Jun 26, 2020 at 01:53:57AM +0200, Tomas Vondra wrote: I'm not saying it's not beneficial to use different limits for different nodes. Some nodes are less sensitive to the size (e.g. sorting often gets faster with smaller work

Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Bruce Momjian
On Fri, Jun 26, 2020 at 04:44:14PM +0200, Tomas Vondra wrote: > On Fri, Jun 26, 2020 at 12:02:10AM -0400, Bruce Momjian wrote: > > On Fri, Jun 26, 2020 at 01:53:57AM +0200, Tomas Vondra wrote: > > > I'm not saying it's not beneficial to use different limits for different > > > nodes. Some nodes are

Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Tomas Vondra
On Fri, Jun 26, 2020 at 12:37:26PM -0400, Bruce Momjian wrote: On Fri, Jun 26, 2020 at 04:44:14PM +0200, Tomas Vondra wrote: On Fri, Jun 26, 2020 at 12:02:10AM -0400, Bruce Momjian wrote: > On Fri, Jun 26, 2020 at 01:53:57AM +0200, Tomas Vondra wrote: > > I'm not saying it's not beneficial to us

Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Peter Geoghegan
On Thu, Jun 25, 2020 at 1:36 PM Andres Freund wrote: > 12 28164.865 ms > > fast ssd: > HEAD92520.680 ms > > magnetic: > HEAD183968.538 ms > > (no reads, there's plenty enough memory. Just writes because the age / > amount thresholds for dirty data are reached) > > In the magnetic case

Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Bruce Momjian
On Fri, Jun 26, 2020 at 07:45:13PM +0200, Tomas Vondra wrote: > On Fri, Jun 26, 2020 at 12:37:26PM -0400, Bruce Momjian wrote: > > I was thinking more of being able to allocate a single value to be > > shared by all active sesions. > > Not sure I understand. What "single value" do you mean? I was

Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Peter Geoghegan
On Fri, Jun 26, 2020 at 4:00 PM Bruce Momjian wrote: > Imagine we set the cluster-wide total of work_mem to 1GB. If a session > asks for 100MB, if there are no other active sessions, it can grant the > entire 100MB. If there are other sessions running, and 500MB has > already been allocated, may

Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Bruce Momjian
On Fri, Jun 26, 2020 at 01:53:05PM -0700, Peter Geoghegan wrote: > On Thu, Jun 25, 2020 at 1:36 PM Andres Freund wrote: > > 12 28164.865 ms > > > > fast ssd: > > HEAD92520.680 ms > > > > magnetic: > > HEAD183968.538 ms > > > > (no reads, there's plenty enough memory. Just writes becau

Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Tomas Vondra
On Fri, Jun 26, 2020 at 07:00:20PM -0400, Bruce Momjian wrote: On Fri, Jun 26, 2020 at 07:45:13PM +0200, Tomas Vondra wrote: On Fri, Jun 26, 2020 at 12:37:26PM -0400, Bruce Momjian wrote: > I was thinking more of being able to allocate a single value to be > shared by all active sesions. Not su

Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Bruce Momjian
On Sat, Jun 27, 2020 at 01:58:50AM +0200, Tomas Vondra wrote: > > Since work_mem affect the optimizer choices, I can imagine it getting > > complex since nodes would have to ask the global work_mem allocator how > > much memory it _might_ get, but then ask for final work_mem during > > execution, a

Re: Default setting for enable_hashagg_disk

2020-06-26 Thread Peter Geoghegan
On Fri, Jun 26, 2020 at 4:59 PM Tomas Vondra wrote: > I agree larger work_mem for hashagg (and thus less spilling) may mean > lower work_mem for so some other nodes that are less sensitive to this. > But I think this needs to be formulated as a cost-based decision, > although I don't know how to d

Re: Default setting for enable_hashagg_disk

2020-06-27 Thread Amit Kapila
On Thu, Jun 25, 2020 at 12:59 AM Robert Haas wrote: > > So, I don't think we can wire in a constant like 10x. That's really > unprincipled and I think it's a bad idea. What we could do, though, is > replace the existing Boolean-valued GUC with a new GUC that controls > the size at which the aggreg

Re: Default setting for enable_hashagg_disk

2020-06-27 Thread Tomas Vondra
On Fri, Jun 26, 2020 at 05:24:36PM -0700, Peter Geoghegan wrote: On Fri, Jun 26, 2020 at 4:59 PM Tomas Vondra wrote: I agree larger work_mem for hashagg (and thus less spilling) may mean lower work_mem for so some other nodes that are less sensitive to this. But I think this needs to be formula

Re: Default setting for enable_hashagg_disk

2020-06-28 Thread Peter Geoghegan
On Sat, Jun 27, 2020 at 3:00 AM Amit Kapila wrote: > I think the advantage of delaying it is that we > might see some real problems (like where hash aggregate is not a good > choice) which can be fixed via the costing model. I think any problem that might come up with the costing is best thought

Re: Default setting for enable_hashagg_disk

2020-06-28 Thread Peter Geoghegan
On Sat, Jun 27, 2020 at 3:41 AM Tomas Vondra wrote: > Well, there are multiple ideas discussed in this (sub)thread, one of > them being a per-query memory limit. That requires decisions how much > memory should different nodes get, which I think would need to be > cost-based. A design like that p

Re: Default setting for enable_hashagg_disk

2020-06-29 Thread Tomas Vondra
On Sun, Jun 28, 2020 at 06:39:40PM -0700, Peter Geoghegan wrote: On Sat, Jun 27, 2020 at 3:41 AM Tomas Vondra wrote: Well, there are multiple ideas discussed in this (sub)thread, one of them being a per-query memory limit. That requires decisions how much memory should different nodes get, whic

Re: Default setting for enable_hashagg_disk

2020-06-29 Thread Bruce Momjian
On Sun, Jun 28, 2020 at 05:40:16PM -0700, Peter Geoghegan wrote: > I think any problem that might come up with the costing is best > thought of as a distinct problem. This thread is mostly about the > problem of users getting fewer in-memory hash aggregates compared to a > previous release running

Re: Default setting for enable_hashagg_disk

2020-06-29 Thread Peter Geoghegan
On Mon, Jun 29, 2020 at 8:07 AM Tomas Vondra wrote: > Not sure I follow. Which cases do you mean when you say that 12 could > safely do them, but 13 won't? I see the following two cases: > a) Planner in 12 and 13 disagree about whether the hash table will fit > into work_mem. > > I don't quite se

Re: Default setting for enable_hashagg_disk

2020-06-29 Thread Bruce Momjian
On Mon, Jun 29, 2020 at 10:20:14AM -0700, Peter Geoghegan wrote: > I have no reason to believe that the planner is any more or any less > likely to conclude that the hash table will fit in memory in v13 as > things stand (I don't know if the BufFile issue matters). > > In general, grouping estimat

Re: Default setting for enable_hashagg_disk

2020-06-29 Thread Peter Geoghegan
On Mon, Jun 29, 2020 at 8:29 AM Bruce Momjian wrote: > Is this something we want to codify for all node types, > i.e., choose a non-spill node type if we need a lot more than work_mem, > but then let work_mem be a soft limit if we do choose it, e.g., allow > 50% over work_mem in the executor for m

Re: Default setting for enable_hashagg_disk

2020-06-29 Thread Tomas Vondra
On Mon, Jun 29, 2020 at 10:20:14AM -0700, Peter Geoghegan wrote: On Mon, Jun 29, 2020 at 8:07 AM Tomas Vondra wrote: Not sure I follow. Which cases do you mean when you say that 12 could safely do them, but 13 won't? I see the following two cases: a) Planner in 12 and 13 disagree about wheth

Re: Default setting for enable_hashagg_disk

2020-06-29 Thread Tomas Vondra
On Mon, Jun 29, 2020 at 01:31:40PM -0400, Bruce Momjian wrote: On Mon, Jun 29, 2020 at 10:20:14AM -0700, Peter Geoghegan wrote: I have no reason to believe that the planner is any more or any less likely to conclude that the hash table will fit in memory in v13 as things stand (I don't know if t

Re: Default setting for enable_hashagg_disk

2020-06-29 Thread Peter Geoghegan
On Mon, Jun 29, 2020 at 2:22 PM Tomas Vondra wrote: > Can you give and example of what you mean by being too permissive or too > conservative? Do you mean the possibility of unlimited memory usage in > v12, and strict enforcement in v13? Yes -- that's all I meant. > IMO enforcing the work_mem li

Re: Default setting for enable_hashagg_disk

2020-07-02 Thread Peter Geoghegan
On Mon, Jun 29, 2020 at 2:46 PM Peter Geoghegan wrote: > On Mon, Jun 29, 2020 at 2:22 PM Tomas Vondra > wrote: > > I agree with this, and I'm mostly OK with having hash_mem. In fact, from > > the proposals in this thread I like it the most - as long as it's used > > both during planning and execu

Re: Default setting for enable_hashagg_disk

2020-07-08 Thread Stephen Frost
Greetings, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > On 2020-Jun-25, Andres Freund wrote: > > > >My point here is that maybe we don't need to offer a GUC to explicitly > > >turn spilling off; it seems sufficient to let users change work_mem so > > >that spilling will naturally not occu

Re: Default setting for enable_hashagg_disk

2020-07-08 Thread Jeff Davis
On Wed, 2020-07-08 at 10:00 -0400, Stephen Frost wrote: > That HashAgg previously didn't care that it was going way over > work_mem was, if anything, a bug. I think we all agree about that, but some people may be depending on that bug. > Inventing new GUCs late in the > cycle like this unde

Re: Default setting for enable_hashagg_disk

2020-07-09 Thread Stephen Frost
Greetings, * Jeff Davis (pg...@j-davis.com) wrote: > On Wed, 2020-07-08 at 10:00 -0400, Stephen Frost wrote: > > That HashAgg previously didn't care that it was going way over > > work_mem was, if anything, a bug. > > I think we all agree about that, but some people may be depending on > that

Re: Default setting for enable_hashagg_disk

2020-07-09 Thread Peter Geoghegan
On Thu, Jul 9, 2020 at 7:03 AM Stephen Frost wrote: > > The one for the planner is already there, and it looks like we need one > > for the executor as well (to tell HashAgg to ignore the memory limit > > just like v12). > > No, ignoring the limit set was, as agreed above, a bug, and I don't > thi

Re: Default setting for enable_hashagg_disk

2020-07-09 Thread Stephen Frost
Greetings, * Peter Geoghegan (p...@bowt.ie) wrote: > On Thu, Jul 9, 2020 at 7:03 AM Stephen Frost wrote: > > > The one for the planner is already there, and it looks like we need one > > > for the executor as well (to tell HashAgg to ignore the memory limit > > > just like v12). > > > > No, ignor

Re: Default setting for enable_hashagg_disk

2020-07-09 Thread Peter Geoghegan
On Thu, Jul 9, 2020 at 3:58 PM Stephen Frost wrote: > > That's not the only justification. The other justification is that > > it's generally reasonable to prefer giving hash aggregate more memory. > > Sure, and it's generally reasonably to prefer giving Sorts more memory > too... as long as you'v

Re: Default setting for enable_hashagg_disk

2020-07-09 Thread David G. Johnston
On Thu, Jul 9, 2020 at 3:58 PM Stephen Frost wrote: > > > If folks > > > want to let HashAgg use more memory then they can set work_mem higher, > > > just the same as if they want a Sort node to use more memory or a > > > HashJoin. Yes, that comes with potential knock-on effects about other > >

Re: Default setting for enable_hashagg_disk

2020-07-09 Thread Stephen Frost
Greetings, * Peter Geoghegan (p...@bowt.ie) wrote: > On Thu, Jul 9, 2020 at 3:58 PM Stephen Frost wrote: > > > That's not the only justification. The other justification is that > > > it's generally reasonable to prefer giving hash aggregate more memory. > > > > Sure, and it's generally reasonabl

Re: Default setting for enable_hashagg_disk

2020-07-09 Thread Justin Pryzby
On Thu, Jul 09, 2020 at 06:58:40PM -0400, Stephen Frost wrote: > * Peter Geoghegan (p...@bowt.ie) wrote: > > On Thu, Jul 9, 2020 at 7:03 AM Stephen Frost wrote: > > It makes more sense than simply ignoring what our users will see as a > > simple regression. (Though I still lean towards fixing the

Re: Default setting for enable_hashagg_disk

2020-07-09 Thread Peter Geoghegan
On Thu, Jul 9, 2020 at 5:08 PM Stephen Frost wrote: > I didn't, and don't, think it particularly relevant to the discussion, > but if you don't like the comparison to Sort then we could compare it to > a HashJoin instead- the point is that, yes, if you are willing to give > more memory to a given

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Jeff Davis
On Thu, 2020-07-09 at 19:18 -0500, Justin Pryzby wrote: > Maybe pretend that Jeff implemented something called CashAgg, which > does > everything HashAgg does but implemented from scratch. Users would be > able to > tune it or disable it, and we could talk about removing HashAgg for > the next 3 >

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Stephen Frost
Greetings, * Justin Pryzby (pry...@telsasoft.com) wrote: > On Thu, Jul 09, 2020 at 06:58:40PM -0400, Stephen Frost wrote: > > * Peter Geoghegan (p...@bowt.ie) wrote: > > > On Thu, Jul 9, 2020 at 7:03 AM Stephen Frost wrote: > > > It makes more sense than simply ignoring what our users will see as

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Stephen Frost
Greetings, * Peter Geoghegan (p...@bowt.ie) wrote: > On Thu, Jul 9, 2020 at 5:08 PM Stephen Frost wrote: > > I didn't, and don't, think it particularly relevant to the discussion, > > but if you don't like the comparison to Sort then we could compare it to > > a HashJoin instead- the point is tha

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Stephen Frost
Greetings, * Jeff Davis (pg...@j-davis.com) wrote: > In principle, Stephen is right: the v12 behavior is a bug, lots of > people are unhappy about it, it causes real problems, and it would not > be acceptable if proposed today. Otherwise I wouldn't have spent the > time to fix it. > > Similarly,

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan
On Fri, Jul 10, 2020 at 7:17 AM Stephen Frost wrote: > > The hash_mem design (as it stands) would affect both hash join and > > hash aggregate. I believe that it makes most sense to have hash-based > > nodes under the control of a single GUC. I believe that this > > granularity will cause the leas

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Tom Lane
Peter Geoghegan writes: > On Fri, Jul 10, 2020 at 7:17 AM Stephen Frost wrote: >> Due to the fact that we're in beta and now is not the time to be >> redesigning this feature. > Did you read the discussion? Beta is when we fix problems that testing exposes in new features. Obviously, we'd rathe

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Jeff Davis
On Fri, 2020-07-10 at 13:46 -0400, Tom Lane wrote: > I looked over Peter's patch in [1], and it seems generally pretty > sane to me, though I concur with the idea that it'd be better to > define the GUC as a multiplier for work_mem. (For one thing, we > could > then easily limit it to be at least

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan
On Fri, Jul 10, 2020 at 10:47 AM Tom Lane wrote: > I looked over Peter's patch in [1], and it seems generally pretty > sane to me, though I concur with the idea that it'd be better to > define the GUC as a multiplier for work_mem. (For one thing, we could > then easily limit it to be at least 1.0

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Alvaro Herrera
On 2020-Jul-10, Peter Geoghegan wrote: > * The maximum allowable value is 100.0, to protect users from > accidentally setting hash_mem_multiplier to a value intended to work > like a work_mem-style KB value (you can't provide an absolute value > like that directly). This maximum is absurdly high.

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan
On Fri, Jul 10, 2020 at 11:34 AM Jeff Davis wrote: > On Fri, 2020-07-10 at 13:46 -0400, Tom Lane wrote: > > I looked over Peter's patch in [1], and it seems generally pretty > > sane to me, though I concur with the idea that it'd be better to > > define the GUC as a multiplier for work_mem. (For

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan
On Fri, Jul 10, 2020 at 2:10 PM Alvaro Herrera wrote: > I'm not sure about this bit; sounds a bit like what has been qualified > as "nannyism" elsewhere. Suppose I want to give a hash table 2GB of > memory for whatever reason. If my work_mem is default (4MB) then I > cannot possibly achieve that

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Tom Lane
Alvaro Herrera writes: > On 2020-Jul-10, Peter Geoghegan wrote: >> * The maximum allowable value is 100.0, to protect users from >> accidentally setting hash_mem_multiplier to a value intended to work >> like a work_mem-style KB value (you can't provide an absolute value >> like that directly). Th

  1   2   3   >