Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-11-15 Thread Amit Kapila
On Mon, Nov 14, 2022 at 11:18 PM Andres Freund wrote: > > On 2022-11-14 16:06:27 +0530, Amit Kapila wrote: > > Pushed. > > Thanks. > Please find the attached patch to remove the buffer cleanup check on the new bucket page. I think we should do this only for the HEAD. Do you have any suggestions o

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-11-14 Thread Andres Freund
On 2022-11-14 16:06:27 +0530, Amit Kapila wrote: > Pushed. Thanks.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-11-14 Thread Amit Kapila
On Tue, Nov 8, 2022 at 3:07 PM Amit Kapila wrote: > > On Mon, Nov 7, 2022 at 11:12 PM Robert Haas wrote: > > > > On Mon, Oct 31, 2022 at 11:49 PM Amit Kapila > > wrote: > > > I am fine with any of those. Would you like to commit or do you prefer > > > me to take care of this? > > > > Sorry for

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-11-08 Thread Amit Kapila
On Mon, Nov 7, 2022 at 11:12 PM Robert Haas wrote: > > On Mon, Oct 31, 2022 at 11:49 PM Amit Kapila wrote: > > I am fine with any of those. Would you like to commit or do you prefer > > me to take care of this? > > Sorry for not responding to this sooner. I think it's too late to do > anything ab

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-11-07 Thread Robert Haas
On Mon, Oct 31, 2022 at 11:49 PM Amit Kapila wrote: > I am fine with any of those. Would you like to commit or do you prefer > me to take care of this? Sorry for not responding to this sooner. I think it's too late to do anything about this for the current round of releases at this point, but I a

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-31 Thread Amit Kapila
On Mon, Oct 31, 2022 at 10:40 PM Robert Haas wrote: > > On Mon, Oct 31, 2022 at 7:27 AM Amit Kapila wrote: > > Agreed, I think the important point to decide is what to do for > > back-branches. We have the next minor release in a few days' time and > > this is the last release for v10. I see the

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-31 Thread Robert Haas
On Mon, Oct 31, 2022 at 7:27 AM Amit Kapila wrote: > Agreed, I think the important point to decide is what to do for > back-branches. We have the next minor release in a few days' time and > this is the last release for v10. I see the following options based on > the discussion here. > > a. Use th

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-31 Thread Amit Kapila
On Tue, Oct 18, 2022 at 8:25 PM Robert Haas wrote: > > On Mon, Oct 17, 2022 at 4:30 PM Andres Freund wrote: > > On 2022-10-17 13:34:02 -0400, Robert Haas wrote: > > Maybe just nuking the IsBufferCleanupOK call is best, I don't know. I > honestly doubt that it matters very much what we pick here.

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-18 Thread Robert Haas
On Mon, Oct 17, 2022 at 4:30 PM Andres Freund wrote: > On 2022-10-17 13:34:02 -0400, Robert Haas wrote: > > I don't feel quite as confident that not attempting a cleanup lock on > > the new bucket's primary page is OK. I think it should be fine. The > > existing comment even says it should be fine

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-17 Thread Andres Freund
On 2022-10-17 13:34:02 -0400, Robert Haas wrote: > I don't feel quite as confident that not attempting a cleanup lock on > the new bucket's primary page is OK. I think it should be fine. The > existing comment even says it should be fine. But, that comment could > be wrong, and I'm not sure that I

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-17 Thread Robert Haas
On Mon, Oct 17, 2022 at 1:02 PM Andres Freund wrote: > That's true in general, but the case of fixing a bug in one place but not in > another nearby is a different story. I agree, but I still think we shouldn't let the perfect be the enemy of the good. > > That code indeed seems stupid, because

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-17 Thread Andres Freund
Hi, On 2022-10-17 10:43:16 -0400, Robert Haas wrote: > On Fri, Oct 14, 2022 at 2:21 PM Andres Freund wrote: > > On 2022-10-14 10:40:11 +0530, Amit Kapila wrote: > > > On Fri, Oct 14, 2022 at 2:25 AM Andres Freund wrote: > > > Fair point. How about something like: "XXX Do we really need to check

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-17 Thread Robert Haas
On Fri, Oct 14, 2022 at 2:21 PM Andres Freund wrote: > On 2022-10-14 10:40:11 +0530, Amit Kapila wrote: > > On Fri, Oct 14, 2022 at 2:25 AM Andres Freund wrote: > > Fair point. How about something like: "XXX Do we really need to check > > for cleanup lock on the new bucket? Here, we initialize th

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-14 Thread Amit Kapila
On Fri, Oct 14, 2022 at 11:51 PM Andres Freund wrote: > > How about something like: > > XXX: This code is wrong, we're overwriting the buffer before "acquiring" the > cleanup lock. Currently this is not known to have bad consequences because > XYZ and the fix seems a bit too risky for the ba

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-14 Thread Andres Freund
Hi, On 2022-10-14 10:40:11 +0530, Amit Kapila wrote: > On Fri, Oct 14, 2022 at 2:25 AM Andres Freund wrote: > > > > > > > > Attached are two patches. The first patch is what Robert has proposed > > > with some changes in comments to emphasize the fact that cleanup lock > > > on the new bucket is

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-13 Thread Amit Kapila
On Fri, Oct 14, 2022 at 2:25 AM Andres Freund wrote: > > > > > Attached are two patches. The first patch is what Robert has proposed > > with some changes in comments to emphasize the fact that cleanup lock > > on the new bucket is just to be consistent with the old bucket page > > locking as we a

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-13 Thread Peter Geoghegan
On Thu, Oct 13, 2022 at 6:10 PM Andres Freund wrote: > My point here is a lot more mundane. The code essentially does > _hash_pageinit(), overwriting the whole page, and *then* conditionally > acquires a cleanup lock. It simply is bogus code. I understood that that was what you meant. It's easy

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-13 Thread Andres Freund
Hi, On 2022-10-13 17:46:25 -0700, Peter Geoghegan wrote: > On Fri, Sep 30, 2022 at 12:05 PM Andres Freund wrote: > > My problem with this approach is that the whole cleanup lock is hugely > > misleading as-is. > > While nbtree VACUUM does use cleanup locks, they don't protect the > index structur

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-13 Thread Peter Geoghegan
On Fri, Sep 30, 2022 at 12:05 PM Andres Freund wrote: > My problem with this approach is that the whole cleanup lock is hugely > misleading as-is. While nbtree VACUUM does use cleanup locks, they don't protect the index structure itself -- it actually functions as an interlock against concurrent

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-13 Thread Andres Freund
Hi, On 2022-10-06 12:44:24 +0530, Amit Kapila wrote: > On Sat, Oct 1, 2022 at 12:35 AM Andres Freund wrote: > > My problem with this approach is that the whole cleanup lock is hugely > > misleading as-is. As I noted in > > https://www.postgresql.org/message-id/20220817193032.z35vdjhpzkgldrd3%40aw

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-13 Thread Amit Kapila
On Wed, Oct 12, 2022 at 4:16 PM vignesh C wrote: > > On Thu, 6 Oct 2022 at 12:44, Amit Kapila wrote: > > > > On Sat, Oct 1, 2022 at 12:35 AM Andres Freund wrote: > > > > > > This issue does occasionally happen in CI, as e.g. noted in this thread: > > > https://www.postgresql.org/message-id/20220

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-12 Thread vignesh C
On Wed, 12 Oct 2022 at 16:16, vignesh C wrote: > > On Thu, 6 Oct 2022 at 12:44, Amit Kapila wrote: > > > > On Sat, Oct 1, 2022 at 12:35 AM Andres Freund wrote: > > > > > > This issue does occasionally happen in CI, as e.g. noted in this thread: > > > https://www.postgresql.org/message-id/2022093

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-12 Thread vignesh C
On Thu, 6 Oct 2022 at 12:44, Amit Kapila wrote: > > On Sat, Oct 1, 2022 at 12:35 AM Andres Freund wrote: > > > > This issue does occasionally happen in CI, as e.g. noted in this thread: > > https://www.postgresql.org/message-id/20220930185345.GD6256%40telsasoft.com > > > > On 2022-08-18 15:17:47

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-10-06 Thread Amit Kapila
On Sat, Oct 1, 2022 at 12:35 AM Andres Freund wrote: > > This issue does occasionally happen in CI, as e.g. noted in this thread: > https://www.postgresql.org/message-id/20220930185345.GD6256%40telsasoft.com > > On 2022-08-18 15:17:47 +0530, Amit Kapila wrote: > > I agree with you that getting rid

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-09-30 Thread Andres Freund
Hi, This issue does occasionally happen in CI, as e.g. noted in this thread: https://www.postgresql.org/message-id/20220930185345.GD6256%40telsasoft.com On 2022-08-18 15:17:47 +0530, Amit Kapila wrote: > I agree with you that getting rid of the clean-up lock on the new > bucket is a more invasive

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-18 Thread Amit Kapila
On Wed, Aug 17, 2022 at 5:55 PM Robert Haas wrote: > > Regarding the question of whether we need a cleanup lock on the new > bucket I am not really seeing the advantage of going down that path. > Simply fixing this code to take a cleanup lock instead of hoping that > it always gets one by accident

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-17 Thread Andres Freund
Hi, On 2022-08-17 15:21:55 -0400, Robert Haas wrote: > On Wed, Aug 17, 2022 at 2:45 PM Andres Freund wrote: > > Given that the cleanup locks in question are "taken" *after* re-initializing > > the page, I'm doubtful that's a sane path forward. It seems quite likely to > > mislead somebody to rely

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-17 Thread Robert Haas
On Wed, Aug 17, 2022 at 2:45 PM Andres Freund wrote: > On 2022-08-17 08:25:06 -0400, Robert Haas wrote: > > Regarding the question of whether we need a cleanup lock on the new > > bucket I am not really seeing the advantage of going down that path. > > Simply fixing this code to take a cleanup loc

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-17 Thread Andres Freund
Hi, On 2022-08-17 08:25:06 -0400, Robert Haas wrote: > Regarding the question of whether we need a cleanup lock on the new > bucket I am not really seeing the advantage of going down that path. > Simply fixing this code to take a cleanup lock instead of hoping that > it always gets one by accident

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-17 Thread Andres Freund
Hi, On 2022-08-17 10:18:14 +0530, Amit Kapila wrote: > > Looking at the non-recovery code makes me even more suspicious: > > > > /* > > * Physically allocate the new bucket's primary page. We want to > > do this > > * before changing the metapage's mapping info, in case

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-17 Thread Robert Haas
On Tue, Aug 16, 2022 at 8:38 PM Andres Freund wrote: > I don't think we can defend against lwlock deadlocks where somebody doesn't > follow the AM's deadlock avoidance strategy. That's a good way of putting it. Tom seems to be postulating that maybe someone can use random tools that exist to take

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Amit Kapila
On Wed, Aug 17, 2022 at 6:27 AM Andres Freund wrote: > > Hi, > > On 2022-08-16 19:55:18 -0400, Tom Lane wrote: > > Robert Haas writes: > > > What sort of random things would someone do with pageinspect functions > > > that would hold buffer pins on one buffer while locking another one? > > > The

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Andres Freund
Hi, On 2022-08-16 19:55:18 -0400, Tom Lane wrote: > Robert Haas writes: > > What sort of random things would someone do with pageinspect functions > > that would hold buffer pins on one buffer while locking another one? > > The functions in hashfuncs.c don't even seem like they would access > > m

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Andres Freund
Hi, On 2022-08-16 17:02:27 -0400, Tom Lane wrote: > Robert Haas writes: > > I had that thought too, but I don't *think* it's the case. This > > function acquires a lock on the oldest bucket page, then on the new > > bucket page. We could deadlock if someone who holds a pin on the new > > bucket p

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Tom Lane
Robert Haas writes: > What sort of random things would someone do with pageinspect functions > that would hold buffer pins on one buffer while locking another one? > The functions in hashfuncs.c don't even seem like they would access > multiple buffers in total, let alone at overlapping times. And

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Robert Haas
On Tue, Aug 16, 2022 at 5:02 PM Tom Lane wrote: > Robert Haas writes: > > I had that thought too, but I don't *think* it's the case. This > > function acquires a lock on the oldest bucket page, then on the new > > bucket page. We could deadlock if someone who holds a pin on the new > > bucket pag

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Tom Lane
Robert Haas writes: > I had that thought too, but I don't *think* it's the case. This > function acquires a lock on the oldest bucket page, then on the new > bucket page. We could deadlock if someone who holds a pin on the new > bucket page tries to take a content lock on the old bucket page. But

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Robert Haas
On Wed, Aug 10, 2022 at 1:28 AM Andres Freund wrote: > I assume this is trying to defend against some sort of deadlock by not > actually getting a cleanup lock (by passing get_cleanup_lock = true to > XLogReadBufferForRedoExtended()). I had that thought too, but I don't *think* it's the case. Thi

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-11 Thread Andres Freund
Hi, On 2022-08-10 14:52:36 +0530, Amit Kapila wrote: > I think this could be the probable reason for failure though I didn't > try to debug/reproduce this yet. AFAIU, this is possible during > recovery/replay of WAL record XLOG_HASH_SPLIT_ALLOCATE_PAGE as via > XLogReadBufferForRedoExtended, we ca

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-11 Thread vignesh C
ket_flag, true); > > > > if (!IsBufferCleanupOK(newbuf)) > > > > elog(PANIC, "hash_xlog_split_allocate_page: failed to > > > > acquire cleanup lock"); > > > > > > > > Why do we just crash if we don't already

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-10 Thread Robert Haas
On Wed, Aug 10, 2022 at 12:39 AM Thomas Munro wrote: > Here's an email about that: > > https://www.postgresql.org/message-id/CAE9k0P=OXww6RQCGrmDNa8=L3EeB01SGbYuP23y-qZJ=4td...@mail.gmail.com Hmm. If I'm reading that email correctly, it indicates that I noticed this problem before commit and aske

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-10 Thread Amit Kapila
t; > > newbuf = XLogInitBufferForRedo(record, 1); > > > _hash_initbuf(newbuf, xlrec->new_bucket, xlrec->new_bucket, > > > xlrec->new_bucket_flag, true); > > > if (!IsBufferCleanupOK(newbuf)) > > > elog(PANIC, "has

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-09 Thread Thomas Munro
On Wed, Aug 10, 2022 at 5:35 PM Thomas Munro wrote: > 2021 Or, rather, 14 days into 2022 :-)

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-09 Thread Thomas Munro
On Wed, Aug 10, 2022 at 5:28 PM Andres Freund wrote: > I don't think it's a safe assumption that nobody would hold a pin on such a > page during recovery. While not the case here, somebody else could have used > pg_prewarm to read it in. > > But also, the checkpointer or bgwriter could have it tem

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-09 Thread Andres Freund
xlrec->new_bucket, xlrec->new_bucket, > > xlrec->new_bucket_flag, true); > > if (!IsBufferCleanupOK(newbuf)) > > elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire > > cleanup lock"); > > > > Why do we just crash i

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-09 Thread Thomas Munro
>new_bucket, xlrec->new_bucket, > > xlrec->new_bucket_flag, true); > > if (!IsBufferCleanupOK(newbuf)) > > elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire > > cleanup lock"); > > > > Why do we just crash if w

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-09 Thread Mark Dilger
bucket_flag, true); > if (!IsBufferCleanupOK(newbuf)) > elog(PANIC, "hash_xlog_split_allocate_page: failed to acquire > cleanup lock"); > > Why do we just crash if we don't already have a cleanup lock? That can't be > right. Or is there supposed

hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-09 Thread Andres Freund
01:46:20.731 GMT [2212][startup] PANIC: hash_xlog_split_allocate_page: failed to acquire cleanup lock 2022-08-10 01:46:20.731 GMT [2212][startup] CONTEXT: WAL redo at 0/7A6EED8 for Hash/SPLIT_ALLOCATE_PAGE: new_bucket 31, meta_page_masks_updated F, issplitpoint_changed F; blkref #0: rel 1663/