ane
Cc: Robert Haas ; pgsql-hack...@postgresql.org;
Thomas Munro
Subject: Re: failures in t/031_recovery_conflict.pl on CI
Hi,
On 2022-05-03 14:23:23 -0400, Tom Lane wrote:
> Andres Freund writes:
> >> So it's almost surely a timing issue, and your theory here seems
plausible.
>
Hi,
On 2022-07-26 12:47:38 -0400, Tom Lane wrote:
> Alvaro Herrera writes:
> > Hey, I just noticed that these tests are still disabled. The next
> > minors are coming soon; should we wait until *those* are done and then
> > re-enable; or re-enable them now to see how they fare and then
> > re-di
Alvaro Herrera writes:
> Hey, I just noticed that these tests are still disabled. The next
> minors are coming soon; should we wait until *those* are done and then
> re-enable; or re-enable them now to see how they fare and then
> re-disable before the next minors if there's still problems we don
On 2022-May-08, Andres Freund wrote:
> On 2022-05-08 13:59:09 -0400, Tom Lane wrote:
> > No one is going to thank us for shipping a known-unstable test case.
>
> IDK, hiding failures indicating bugs isn't really better, at least if it
> doesn't look like a bug in the test. But you seem to have a
Andres Freund writes:
> On 2022-05-08 15:11:39 -0700, Andres Freund wrote:
>> But you seem to have a stronger opinion on this than me, so I'll skip the
>> entire test for now :/
> And done.
Thanks, I appreciate that.
regards, tom lane
On 2022-05-08 15:11:39 -0700, Andres Freund wrote:
> But you seem to have a stronger opinion on this than me, so I'll skip the
> entire test for now :/
And done.
Hi,
On 2022-05-08 13:59:09 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2022-05-08 11:28:34 -0400, Tom Lane wrote:
> >> Per lapwing's latest results [1], this wasn't enough. I'm again thinking
> >> we should pull the whole test from the back branches.
>
> > That failure is different fr
Andres Freund writes:
> On 2022-05-08 11:28:34 -0400, Tom Lane wrote:
>> Per lapwing's latest results [1], this wasn't enough. I'm again thinking
>> we should pull the whole test from the back branches.
> That failure is different from the earlier failures though. I don't think it's
> a timing i
Hi,
On 2022-05-08 11:28:34 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2022-05-05 23:57:28 -0400, Tom Lane wrote:
> >> Are you sure there's just one test that's failing? I haven't checked
> >> the buildfarm history close enough to be sure of that. But if it's
> >> true, disabling just
Andres Freund writes:
> On 2022-05-05 23:57:28 -0400, Tom Lane wrote:
>> Are you sure there's just one test that's failing? I haven't checked
>> the buildfarm history close enough to be sure of that. But if it's
>> true, disabling just that one would be fine (again, as a stopgap
>> measure).
>
Andres Freund writes:
> Done. Perhaps you could trigger a run on longfin, that seems to have been the
> most reliably failing animal?
No need, its cron job launched already.
regards, tom lane
On 2022-05-06 12:12:19 -0400, Tom Lane wrote:
> Andres Freund writes:
> > I looked through all the failures I found and it's two kinds of failures,
> > both
> > related to the deadlock test. So I'm thinking of skipping just that test as
> > in
> > the attached.
>
> > Working on committing / bac
Andres Freund writes:
> I looked through all the failures I found and it's two kinds of failures, both
> related to the deadlock test. So I'm thinking of skipping just that test as in
> the attached.
> Working on committing / backpatching that, unless somebody suggests changes
> quickly...
WFM.
Hi,
On 2022-05-05 23:57:28 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2022-05-05 23:36:22 -0400, Tom Lane wrote:
> >> So I reluctantly vote for removing 031_recovery_conflict.pl in the
> >> back branches for now, with the expectation that we'll fix the
> >> infrastructure and put it ba
Andres Freund writes:
> On 2022-05-05 23:36:22 -0400, Tom Lane wrote:
>> So I reluctantly vote for removing 031_recovery_conflict.pl in the
>> back branches for now, with the expectation that we'll fix the
>> infrastructure and put it back after the current release round
>> is done.
> What about
Hi,
On 2022-05-05 23:36:22 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2022-05-05 22:07:40 -0400, Tom Lane wrote:
> >> May I ask where we're at on this? Next week's back-branch release is
> >> getting uncomfortably close, and I'm still seeing various buildfarm
> >> animals erratically
Andres Freund writes:
> On 2022-05-05 22:07:40 -0400, Tom Lane wrote:
>> May I ask where we're at on this? Next week's back-branch release is
>> getting uncomfortably close, and I'm still seeing various buildfarm
>> animals erratically failing on 031_recovery_conflict.pl.
> Looks like the proble
Hi,
On 2022-05-05 22:07:40 -0400, Tom Lane wrote:
> Andres Freund writes:
> > Attached is a fix for the test that I think should avoid the problem.
> > Couldn't
> > repro it with it applied, under both rr and valgrind.
>
> May I ask where we're at on this? Next week's back-branch release is
>
Andres Freund writes:
> Attached is a fix for the test that I think should avoid the problem. Couldn't
> repro it with it applied, under both rr and valgrind.
May I ask where we're at on this? Next week's back-branch release is
getting uncomfortably close, and I'm still seeing various buildfarm
Hi,
On 2022-05-03 14:23:23 -0400, Tom Lane wrote:
> Andres Freund writes:
> >> So it's almost surely a timing issue, and your theory here seems plausible.
>
> > Unfortunately I don't think my theory holds, because I actually had added a
> > defense against this into the test that I forgot about
Andres Freund writes:
> On 2022-05-03 01:16:46 -0400, Tom Lane wrote:
>> Irritatingly, it doesn't reproduce (at least not easily) in a manual
>> build on the same box.
> Odd, given how readily it seem to reproduce on the bf. I assume you built with
>> Uses -fsanitize=alignment -DWRITE_READ_PARSE_
Hi,
On 2022-05-03 01:16:46 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2022-05-02 23:44:32 -0400, Tom Lane wrote:
> >> I can poke into that tomorrow, but are you sure that that isn't an
> >> expectable result?
>
> > It's not expected. But I think I might see what the problem is:
> > We
On 2022-May-02, Andres Freund wrote:
> > > pgindent uses some crazy formatting nearby:
> > > SendRecoveryConflictWithBufferPin(
> > >
> > > PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
> >
> > I do not believe that that line break is pgindent's f
Andres Freund writes:
> On 2022-05-02 23:44:32 -0400, Tom Lane wrote:
>> I can poke into that tomorrow, but are you sure that that isn't an
>> expectable result?
> It's not expected. But I think I might see what the problem is:
> We wait for the FETCH (and thus the buffer pin to be acquired). But
Hi,
On 2022-05-02 23:44:32 -0400, Tom Lane wrote:
> Andres Freund writes:
> > I ended up committing the extension of the test first, before the fix. I
> > think
> > that's the cause of the failure on longfin on serinus. Let's hope the
> > situation improves with the now also committed (and backp
Andres Freund writes:
> I ended up committing the extension of the test first, before the fix. I think
> that's the cause of the failure on longfin on serinus. Let's hope the
> situation improves with the now also committed (and backpatched) fix.
longfin's definitely not very happy: four out of s
Hi,
On 2022-04-29 19:26:59 -0400, Tom Lane wrote:
> Andres Freund writes:
> > - The test uses pump_until() and wait_for_log(), which don't exist in the
> > backbranches. For now I've just inlined the implementation, but I guess we
> > could also backpatch their introduction?
>
> I'd backpatc
Andres Freund writes:
> Questions:
> - I'm planning to backpatch the test as 031_recovery_conflict.pl, even though
> preceding numbers are unused. It seems way more problematic to use a
> different number in the backbranches than have gaps?
+1
> - The test uses pump_until() and wait_for_log(
Hi,
Attached are patches for this issue.
It adds a test case for deadlock conflicts to make sure that case isn't
broken. I also tested the recovery conflict tests in the back branches, and
they work there with a reasonably small set of changes.
Questions:
- I'm planning to backpatch the test as
Hi,
On 2022-04-12 15:05:22 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2022-04-09 19:34:26 -0400, Tom Lane wrote:
> >> +1. This is probably more feasible given the latch infrastructure
> >> than it was when that code was first written.
>
> > What do you think about just reordering the
Andres Freund writes:
> On 2022-04-09 19:34:26 -0400, Tom Lane wrote:
>> +1. This is probably more feasible given the latch infrastructure
>> than it was when that code was first written.
> What do you think about just reordering the disable_all_timeouts() to be
> before the got_standby_deadlock
Hi,
On 2022-04-09 19:34:26 -0400, Tom Lane wrote:
> Andres Freund writes:
> > It's been broken in different ways all the way back to 9.0, from what I can
> > see, but I didn't check every single version.
>
> > Afaics the fix is to nuke the idea of doing anything substantial in the
> > signal
>
Andres Freund writes:
> It's been broken in different ways all the way back to 9.0, from what I can
> see, but I didn't check every single version.
> Afaics the fix is to nuke the idea of doing anything substantial in the signal
> handler from orbit, and instead just set a flag in the handler.
+
Hi,
On 2022-04-09 16:10:02 -0700, Andres Freund wrote:
> It's not that (although I still suspect it's a problem). It's a self-deadlock,
> because StandbyTimeoutHandler(), which ResolveRecoveryConflictWithBufferPin()
> *explicitly enables*, calls SendRecoveryConflictWithBufferPin(). Which does
> Ca
Hi,
On 2022-04-09 15:00:54 -0700, Andres Freund wrote:
> What are we expecting to wake the startup process up, once it does
> SendRecoveryConflictWithBufferPin()?
>
> It's likely not the problem here, because we never seem to have even reached
> that path, but afaics once we've called disable_all_
Hi,
On 2022-04-08 22:05:01 -0700, Andres Freund wrote:
> On 2022-04-08 21:55:15 -0700, Andres Freund wrote:
> > on CI [1] the new t/031_recovery_conflict.pl is failing occasionally. Which
> > is
> > interesting, because I ran it there dozens if not hundreds of times before
> > commit, with - I th
Hi,
On 2022-04-08 21:55:15 -0700, Andres Freund wrote:
> on CI [1] the new t/031_recovery_conflict.pl is failing occasionally. Which is
> interesting, because I ran it there dozens if not hundreds of times before
> commit, with - I think - only cosmetic changes.
Scratch that part - I found an ins
Hi,
on CI [1] the new t/031_recovery_conflict.pl is failing occasionally. Which is
interesting, because I ran it there dozens if not hundreds of times before
commit, with - I think - only cosmetic changes.
I've reproduced it in a private branch, with more logging. And the results are
sure interes
38 matches
Mail list logo