A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-04 Thread Christoph Berg
Re: Noah Misch > On Tue, Jul 25, 2023 at 01:56:41PM +0530, Bharath Rupireddy wrote: > > I've observed the following failure once in one of my Cirrus CI runs > > on Windows Server on HEAD: > > > > timed out waiting for match: (?^:User was holding shared buffer pin > > for too long) at > > C:/cirrus

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-10-09 Thread Alexander Lakhin
Hi, 13.08.2023 00:00, Andres Freund wrote: Hi, On 2023-08-12 15:50:24 +1200, Thomas Munro wrote: Thanks. I realised that it's easy enough to test that theory about cleanup locks by hacking ConditionalLockBufferForCleanup() to return false randomly. Then the test occasionally fails as describ

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-10-14 Thread Alexander Lakhin
13.08.2023 00:00, Andres Freund wrote: On 2023-08-12 15:50:24 +1200, Thomas Munro wrote: Thanks. I realised that it's easy enough to test that theory about cleanup locks by hacking ConditionalLockBufferForCleanup() to return false randomly. Then the test occasionally fails as described. Seems

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-28 Thread Christoph Berg
Re: Andres Freund > > Thanks. I realised that it's easy enough to test that theory about > > cleanup locks by hacking ConditionalLockBufferForCleanup() to return > > false randomly. Then the test occasionally fails as described. Seems > > like we'll need to fix that test, but it's not evidence o

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-28 Thread Thomas Munro
On Tue, Aug 29, 2023 at 1:58 AM Christoph Berg wrote: > This should be fixed before the 16 release. Here's what I was thinking of doing for this, given where we are in the release schedule: * commit the signal-refactoring patch in master only * plan to back-patch it into 16 in a later point rele

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-29 Thread Christoph Berg
Re: Thomas Munro > 2022), and then backpatched to all releases. They were disabled again > in release branches 10-14 (discussion at > https://postgr.es/m/3447060.1652032...@sss.pgh.pa.us): > > +plan skip_all => "disabled until after minor releases, due to instability"; Right: https://pgdgbuild.d

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-09-06 Thread Thomas Munro
I have now disabled the test in 15 and 16 (like the older branches). I'll see about getting the fixes into master today, and we can contemplate back-patching later, after we've collected a convincing volume of test results from the build farm, CI and hopefully your s390x master snapshot builds (if

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-09-06 Thread Thomas Munro
On Sun, Aug 13, 2023 at 9:00 AM Andres Freund wrote: > On 2023-08-12 15:50:24 +1200, Thomas Munro wrote: > > Thanks. I realised that it's easy enough to test that theory about > > cleanup locks by hacking ConditionalLockBufferForCleanup() to return > > false randomly. Then the test occasionally

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-09-06 Thread Thomas Munro
On Tue, Aug 8, 2023 at 11:08 AM Andres Freund wrote: > On 2023-08-07 12:57:40 +0200, Christoph Berg wrote: > > v8 worked better. It succeeded a few times (at least 12, my screen > > scrollback didn't catch more) before erroring like this: > > > [10:21:58.410](0.151s) ok 15 - startup deadlock: logf

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-09-07 Thread Christoph Berg
Re: Thomas Munro > I have now disabled the test in 15 and 16 (like the older branches). > I'll see about getting the fixes into master today, and we can > contemplate back-patching later, after we've collected a convincing > volume of test results from the build farm, CI and hopefully your > s390x

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-04 Thread Thomas Munro
On Sat, Aug 5, 2023 at 12:43 AM Christoph Berg wrote: > I managed to reproduce it on the shell by running the test in a loop a > few times. The failure looks like this: It's great that you can reproduce this semi-reliably! I've rebased the patch, hoping you can try it out. https://www.postgresq

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-06 Thread Christoph Berg
Re: Thomas Munro > It's great that you can reproduce this semi-reliably! I've rebased > the patch, hoping you can try it out. Unfortunately very semi, today I didn't get to the same point where it exited after test 7, but got some other timeouts. Not even sure they are related to this (?) problem

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-06 Thread Thomas Munro
On Mon, Aug 7, 2023 at 8:40 AM Christoph Berg wrote: > 2023-08-06 17:21:24.078 UTC [127] 031_recovery_conflict.pl FATAL: > unrecognized conflict mode: 7 Thanks for testing! Would you mind trying v8 from that thread? V7 had a silly bug (I accidentally deleted a 'case' label while cleaning

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-07 Thread Christoph Berg
Re: Thomas Munro > Thanks for testing! Would you mind trying v8 from that thread? V7 > had a silly bug (I accidentally deleted a 'case' label while cleaning > some stuff up, resulting in the above error...) v8 worked better. It succeeded a few times (at least 12, my screen scrollback didn't catc

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-07 Thread Andres Freund
Hi, On 2023-08-07 12:57:40 +0200, Christoph Berg wrote: > Re: Thomas Munro > > Thanks for testing! Would you mind trying v8 from that thread? V7 > > had a silly bug (I accidentally deleted a 'case' label while cleaning > > some stuff up, resulting in the above error...) > > v8 worked better. It

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-08 Thread Christoph Berg
Re: Andres Freund > Hm, that could just be a "harmless" race. Does it still happen if you apply > the attached patch in addition? Putting that patch on top of v8 made it pass 294 times before exiting like this: [08:52:34.134](0.032s) ok 1 - buffer pin conflict: cursor with conflicting pin establ

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-08 Thread Thomas Munro
On Wed, Aug 9, 2023 at 2:01 AM Christoph Berg wrote: > Putting that patch on top of v8 made it pass 294 times before exiting > like this: > > [08:52:34.134](0.032s) ok 1 - buffer pin conflict: cursor with conflicting > pin established > Waiting for replication conn standby's replay_lsn to pass 0/

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-09 Thread Christoph Berg
Re: Thomas Munro > On Wed, Aug 9, 2023 at 2:01 AM Christoph Berg wrote: > > Putting that patch on top of v8 made it pass 294 times before exiting > > like this: > > > > [08:52:34.134](0.032s) ok 1 - buffer pin conflict: cursor with conflicting > > pin established > > Waiting for replication conn

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-09 Thread Christoph Berg
Re: To Thomas Munro > 603 iterations later it hit again, but didn't log anything. (I believe > I did run "make" in the right directory.) Since that didn't seem right I'm running the tests again. There are XXX lines in the output, but it hasn't hit yet. Christoph

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-10 Thread Christoph Berg
Re: To Thomas Munro > 603 iterations later it hit again, but didn't log anything. (I believe > I did run "make" in the right directory.) This time it took 3086 iterations to hit the problem. Running c27f8621eedf7 + Debian patches + v8 + pgstat-report-conflicts-immediately.patch + the XXX logging.

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-10 Thread Thomas Munro
On Thu, Aug 10, 2023 at 9:15 PM Christoph Berg wrote: > No XXX lines this time either, but I've seen then im logfiles that > went through successfully. Hmm. Well, I think this looks like a different kind of bug then. That patch of mine is about fixing some unsafe coding on the receiving side of

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-11 Thread Christoph Berg
Re: Thomas Munro > On Thu, Aug 10, 2023 at 9:15 PM Christoph Berg wrote: > > No XXX lines this time either, but I've seen then im logfiles that > > went through successfully. > > Do you still have the data directories around from that run, so we can > see if the expected Heap2/PRUNE was actually

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-11 Thread Thomas Munro
Thanks. I realised that it's easy enough to test that theory about cleanup locks by hacking ConditionalLockBufferForCleanup() to return false randomly. Then the test occasionally fails as described. Seems like we'll need to fix that test, but it's not evidence of a server bug, and my signal hand

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-12 Thread Andres Freund
Hi, On 2023-08-12 15:50:24 +1200, Thomas Munro wrote: > Thanks. I realised that it's easy enough to test that theory about > cleanup locks by hacking ConditionalLockBufferForCleanup() to return > false randomly. Then the test occasionally fails as described. Seems > like we'll need to fix that